Jump to content

import large ascii file


This topic is 6517 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have a 20 fixed width ascii files with 20,000+ records each that I would like to import into filemaker. The files are text files with no deliminaters, 200+ variables, taking up 3200 "positions" or characters.

I would like some advice as to how to best move these files into filemaker.

Thank you for your advice

Chris

Link to comment
Share on other sites

Not enough information I think...

Can you post a few records? Is every record structured exactly the same?

If there are no delimiters, how do you know where what field starts and ends? Is everything fixed length?

Link to comment
Share on other sites

I second Wim, like what is the end of record, just a certain number of characters? In any case I would suggest a good text editor, such as BBEdit or its free lite version.

http://www.barebones.com

Its "grep" search can easily find a given number of characters and add a delimiter (tab for fields, return for records). A period (.) is one character, so a 6 character field would be:

Find: ......

Replace: &t

(& is "whatever matched")

You'd likely want to do the returns for the records first, or remember to add the tab to the number of characters; a return is r

BBEdit is fully AppleScriptable. BBEdit Lite is not. TextWrangler is, but it requires OS 10.3.5. You might want to upgrade. It was hard to even find BBEdit Lite on their web site; it is so old:

http://www.barebones.com/products/bblite/index.shtml

Link to comment
Share on other sites

Thanks for the insights. Here is some more information.

All of the records and disks have that same structure, fixed width, with a carriage return at the end of each record. The data files contain many padded "blank" spaces, making visual inspection of the records a little off putting. But here is what it looks like.

2003PD126640010208250101S C Y00260011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ALAN J GERSON PETER GLEASON DEMOCRATIC

The "blanks" don't reproduce in copy/paste.

Here are some sample variables:

Start End Type Length

Election-ID 1 5 A 5

Office 6 7 A 2

County 8 8 A 1

District 9 10 A 2

Ass. Dist 11 12 A 2

Elect. Dist. 13 15 A 3

And so on.

Two additional sets of data, of similar file structure and size will also need to be converted and linked between each other to generate reports and match address/name data from membership lists to voter histories, election results and turn-out information by census block.

Thanks for your further thoughts.

Link to comment
Share on other sites

Since they are seperated by a carriage return the import shouldnt be an issue. Then your fields should be calculations or set using a loop and the calcs in a Set Field script step. examples

Election_ID = Trim(Middle(MyImportFieldName; 1; 5))

Office = Trim(Middle(MyImportFieldName; 6; 2))

County = Trim(Middle(MyImportFieldName; 8; 1))

and so on.....

Middle(MyField; StartPosition, NumberOfCharacters)

Trim removes leading and trailing spaces

Link to comment
Share on other sites

It's a lot of work, writing 200+ calculations like this. I would use a repeating calculation field to split out the field data, then export it as text and convert the separator to a tab character.

This way you only need to type the starts and lengths into 2 repeating fields - and even that could be imported, using the same process in reverse.

Link to comment
Share on other sites

It is also possible to write a fairly simple grep that will split the fields. I'm not quite sure what's going on with the names at the end, so I'm ignoring them. You just put periods for the characters, enclosed in () parenthesis, and use the backslashed position as the replace. I think you have a maximum of 9 fields you can do (and the last one on the line doesn't need a tab). Easier to see than explain (esp. for those such as myself).

This is for 6 fields; and I don't know for sure that the lengths are correct. But you can get the idea, and that's what matters.

Find:

(.........................) (.) (.............................................) (.............................................) (.............................................) (.............................................)

Replace:

1t2t3t4t5t6t

Link to comment
Share on other sites

It's a lot of work, writing 200+ calculations like this. I would use a repeating calculation field to split out the field data, then export it as text and convert the separator to a tab character.

This way you only need to type the starts and lengths into 2 repeating fields - and even that could be imported, using the same process in reverse.

IMO its about the same amount of work as the only thing that changes in the calc are the same numbers you are putting in the repeating fields. I also think it would be much easier to troubleshoot my way, but I readily admit you know more about this than I do so I may be missing something.

Link to comment
Share on other sites

I think that typing a list of numbers in sequence is a lot faster than duplicating 200 fields and editing them one at a time, trying to select the 2 numbers in the formula. Also, in case of a mistake, you have the entire list in front of you, so tracking it down should be easy.

It doesn't even have to be a repeating field - you could put the data into a text field, separated by carriage returns. So if you skipped a number, you could correct it by insert. And of course, if such a list already exists, then import, or copy and paste, should make this a breeze.

Link to comment
Share on other sites

All good suggestions.

I will work on using sbg2's suggestion. As a novice, i think laying out the strucutre so plainly will allow me to write an import script that will work. While it may be longer, I can understand it -- heh.

Thanks again. I will post the outcome on Monday.

Link to comment
Share on other sites

In order to get the data into FM, I first put the "long-string" into excel. After being imported/opened in filemaker, I used the parse function as suggested. I tried out a loop and defined calculation and both worked fine, though defining the feild was much faster. However, I am running into a problem on the larger files, which cannot be opened in excel or word due to their size. When I try to open or import the original .txt file data directly into filemaker i get an error message saying the file type is not supported.

Any thoughts?

Link to comment
Share on other sites

I solved my problem. I needed to install the correct drivers.

When taking additional files into the data base, I found that defining the calculations, though time consumeing the first time, makes the importing/conversion process much faster than running a script/loop to perform the calculations.

Link to comment
Share on other sites

When taking additional files into the data base, I found that defining the calculations, though time consumeing the first time, makes the importing/conversion process much faster than running a script/loop to perform the calculations.

And, has the added benefit that nearly 100% of filemaker developers would understand this choice, so your calcs would make sense to the next person to work on it...

One tip : beware the Comma. Watch your import format -- Filemaker sometimes decides that commas are a field delimiter when you don't want it to be in a fixed-format.

Link to comment
Share on other sites

Thanks for the file. Once filemaker get through importing the data...going on 3 hours now. I will look at it.

The file is 514 MB and will likely have more than 1 million records when done. I know this is allot of data, but more than 3 hours to import the file? Generally, I use SPSS to manage data sets this large but we want to have the data available on the web. I am running on a macbook pro with 1 GB of memory.

I have the file structured to import the first "long string" from which all of the other variables are calculated. Would it be faster to do the calculations via an "insert calculation" script after the initial "long string" has been imported? Seeing FM process each record at a time using the loop function lead me to think that it would be slower, but I could be wrong. Is there some batch processing function that I missed? I have five more CDs and the idea of having to wait several hours for each one leads me to believe that I have missed some import step to speed the process along.

Thanks for your thoughts.

Link to comment
Share on other sites

This topic is 6517 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.