Jump to content
Server Maintenance This Week. ×

Import Pipe-Delimited Text Data


This topic is 5475 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I am attempting to import data from a pipe-delimited text file, and since FileMaker still doesn't support that format, I'm trying to create a script to import the source data one full line at a time, and then use a multi-part SUBSTITUTE calculation to convert any pre-existing double-quotes into another symbol, and then convert the pipes to quote-comma-quote, which I then surround with quotes at the beginning and end of each line. I would then export the data to another text file, for FileMaker to import as a CSV file.

My problem is that FileMaker is not properly importing full lines from the source file; it is truncating after commas. Apparently, FileMaker has only two "text import" modes (tab-delimited and CSV), and apparently the default mode is CSV, so it concludes that the field ends with the comma.

My problem? The source file contains commas, double-quotes, and tab symbols (which is why it's stored in a pipe-delimited format to begin with).

Yes, I could import the source file into a container -- but the SUBSTITUTE function appears to work only on text fields.

This is really, really frustrating -- I've actually paid people to do all this same work twice before (once using ASP and SQL Server, then again using PHP/MySQL) but I cannot decipher or debug that code, so I am being forced to start over again.

Edited by Guest
Link to comment
Share on other sites

Mark, there are ways around it of course, as Lee suggests. But is this an on-going need? If so, I would think you'll want to script the entire thing instead of handling it manually. Can you provide an example file and an example of the text to be imported? Comment, Bob Weaver, Fenton and others are text-Masters when it comes to stripping and importing this type of data. I'm sure we can help you out!

LaRetta :wink2:

Link to comment
Share on other sites

Thanks for the replies. I am writing a script for this, and that's where I'm encountering problems.

There are several hundred of these pipe-delimited text files, each with its own irregular update schedule. A few files are updated daily; some are updated once a week or so; some don't seem to be updated more than once a quarter.

Each file contains these fields:

ProductID|Name|MerchantID|Merchant|Link|Thumbnail|BigImage|Price|RetailPrice|Category|SubCategory|Description|Custom1|Custom2|Custom3|Custom4|Custom5|LastUpdated|status|manufacturer|partnumber|merchantCategory|merchantSubcategory|shortDescription|ISBN|UPC

and of course each field has a defined type and size. Many fields are frequently empty.

Here's an example row of data:

453740635|ArizonaSunr Fragrant scented Candle & Clay Holder|8216|Arizona Sun Products, Inc.|http://www.shareasale.com/m-pr.cfm?merchantID=8216&userID=104451&productID=453740635|http://www.arizonasun.com/Arizona-Sun/906B_sm.jpg|http://www.arizonasun.com/Arizona-Sun/906B.jpg|10.95||Gifts/Specialty|Misc. Holiday|About the ArizonaSunr Fragrant scented Candle & Clay Holder - This Scented Candle and Clay Holder gives you the ArizonaSunr fragrance experience. ArizonaSunr Fragrant Candle includes a clayholder.|Gifts|General Gifts||||2005-08-26 12:54:38.590|instock|||||||

Every file is actually furnished by a separate merchant, so there are a wide range of problems in the source data which my script(s) will need to clean up. Some of the files contain embedded HTML; many contain "transmogrified characters" such as the above record which lists a brand name as "ArizonaSunr" because it translated the registered-trademark symbol (R-in-a-circle) into a lower-case "r."

As I mentioned, I've paid programmers (twice) to try to implement this using "the tools they use," and each time I was left with code that I can't understand or modify, using tools I am not familiar with. I have several versions of a "change list" or "substitution list" or "filter list" (depending on how you use vocabulary, these terms clearly mean very different things).

I finally decided to try doing this with FileMaker, but I think I'm learning that this was also a mistake, in part because FileMaker apparently doesn't handle regular expressions (or anything like them), and I suspect that its scripting language and Substitute() statements will take WEEKS to process several million records.

Edited by Guest
Link to comment
Share on other sites

One method is to import the (file) data into a global field and parse out into separate field/records with a script but not something for millions of records.

A more robust method is to write a XSLT to convert data on import - again - unsure of performance with millions of records in Filemaker

Or can be done pretty easily with MySQL using LOAD DATA function. Could Use Filemaker ESS to work with data from there?

Link to comment
Share on other sites

There are a few plug-ins which can do regular expression parsing within FileMaker. One of them is ScriptMaster, available at 360Works, which is free, and can do all kinds of things. It uses Java, so has a bit of a learning curve. In the example file it shows how to match a regex pattern, to see if there is a match. But it does not show how to return the text, which is normally what you'd want to do. I don't know Java; I imagine it is fairly easy (I wish they'd include an example showing that).

360Works also has a dedicated regex plug-in, Patterns. But it is somewhat expensive (compared to free), though it will run for 2 hours a FileMaker session for free.

Another free plug-in is this one:

http://jensteich.de/regex-plugin/RegexPlugIn.zip

There apparently is no web site. So you just have to download it, and read the documentation, which includes a link to an example file, which shows you how to use it.

There is also free file, "Regex", using FileMaker Custom Functions, which implement regex matching (using several, like 15, custom functions). It was by "the shadow", who is (or was) a poster to these forums. I could post it here, but there may be a newer version; so I'll wait and see if he (or others) respond first.

Since I'm mostly on a Mac, I tend to use AppleScript and Unix commands for these things, so haven't used the above, other than messing about. They all do more or less the same thing, tho there is likely a speed difference, which would be important.

Regarding your other question, re: weird characters in the text. If they are already screwed up in the source file, well you're stuck with them. But if they are HTML entities, some conversion tool should be able to translate them to their text equivalent.

ScriptMaster has the ability to do this. I read that, but don't really know how. There is a Java "Entity" command, as well as a "URLDecoder", and "HTML" for tags. It is a wide tool; but it really helps to know some Java.

On a Mac there is an AppleScript Scripting Addition, TextCommands, which I've used to do similar, but that doesn't really help you.

Link to comment
Share on other sites

Actually, the above plug-ins are not that much help, as you'd likely want to do the Find/Replace on the text files themselves, before importing. It's likely faster to do that than it would be to move some text out of a FileMaker field into another field.

You would be looking for a plug-in that would let you run command line Find/Replace on the text files. There are a few. Troi File can do this, as well as run a shell command on either OS.

Link to comment
Share on other sites

The main point here, IMHO, is that Filemaker does not support a pipe-delimited format. There are limits to what you can do in Filemaker to overcome that - because before you can manipulate the data IN Filemaker, you need to get the data INTO Filemaker first.

If the data is allowed to contain both commas and tabs within its columns, you can never be sure you have imported the entire row.

So it seems the best route would be to pre-process the data in another application. For example, just exchanging pipes and tabs and saving the result with a .tab extension should make it ready to be imported "through the front door".

Link to comment
Share on other sites

To follow up: TextPipe Pro is definitely doing more for me than I think I could have ever expected from scripting within FileMaker.

I've been able to incorporate several different filters (including removal of embedded HTML code, character mapping, white-space removal, and a wide range of text-string substitutions, as well as transforming from pipe-delimited to CSV.

My current "test suite" of files consists of 139 files totalling 1.2GB, and TextPipe Pro filtered through the files in 25 minutes. FileMaker was then able to import more than 1 million records from those files in less than two hours (I don't have the exact time as I went to dinner after an hour.)

I still have a lot of work to do, both with TextPipe Pro and with FileMaker, but I definitely am glad that I went "outside" FileMaker (and plug-ins) to find a more effective solution.

Link to comment
Share on other sites

  • 3 months later...
  • Newbies

This is really trivial issue. Do you know Perl language or Python? If you know one of these string processing language, do the following:

open file:

Read the line:

split your data( the line looks like this)

my ($field one, $field 2....)= split('|', currentLine).

Then right each field.

The easiest way is the following, If you know linux and access to one on your desktop:

sed '/|/t/{print}' your file name> formated file

The problem have as user is: To import 102 million rows and index few field, it takes more than 6 days. annoying!! some how, if you have millions of rows like me.

Peace

Link to comment
Share on other sites

This topic is 5475 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.