Jump to content

Please help me with a parsing question


This topic is 7557 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have a list on a floppy that I want to import into FileMaker 6.0 but the line information is separated by a space like;

Gary Stemper 12551 Wiffer Way Conroy, MI 66321 800/555-1212

So when I try to import it I get;

Gary Stemper 12551 Wiffer Way Conroy, in one field and

MI 66321 800/555-1212 in another field

Is there any way that I can get first name, last name, address etc. into each of the proper FMP fields?

Thanks,

Carl

Link to comment
Share on other sites

Yes, there is a way, but it can take a long time. I recently had to do this. I used "grep" in a text editor. Grep is more or less cross-platform. I'm on a Mac, so I used BBEdit. There are editors for Windows, but I haven't used them; EditPad Pro has been recommended.

Basically you have to be able to recognize patterns of text, distinguishing between numbers and text.

It takes several steps to end up with the desired result. Yours is not too bad of a case however, 'cause there's so few "fields."

Example, Step 1:

Find: ^([a-z|A-Z]+| )+

(means: find words at the beginning with letters, no numbers)

Replace: &t

(means: replace with itself and a tab)

Step 2:

d+ (w| )+,

(find address, including comma, which hopefully only occurs here)

Replace: &t

Step 3:

[^t]ddddd

(find zip code, but not address#'s, which now have a tab in front)

Replace: &t&

(tab in front and back)

It's now done, but there's a little cleaning, to remove the the extra comma. You could also split off the area code.

Text editors can run these operations in a large file in a matter of seconds. You could probably do this with FileMaker calculations also, but it would be more difficult. Text editors have Undo, which comes in handy.

I'm not all that hot at grep; others may be better routines.

Link to comment
Share on other sites

Hi Fenton,

Very good, I'm just learning Regular Expressions and Greps too.

You can shorten your Step 1 [color:"red"] ^([a-z|A-Z]+| )+ to [color:"blue"] ^([A-Za-z]+| )+ and I would add a 4th step as below.

Example, Step 1:

Find: ^([A-Za-z]+| )+

(means: find words at the beginning with letters, no numbers)

Replace: &t

(means: replace with itself and a tab)

Step 2:

d+ (w| )+,

(find address, including comma, which hopefully only occurs here)

Replace: &t

Step 3:

[^t]ddddd

(find zip code, but not address#'s, which now have a tab in front)

Replace: &t&

(tab in front and back)

[color:"blue"]And I would add a fourth step to put a tab between the State abbreviation and Zip code

Step 4

[0-9]+t

Finds the zip and and replaces it with a tab zip

Replace:

t&

Lee

cool.gif

Link to comment
Share on other sites

You're right about the shorten; I didn't need the first pipe "|" (which means basically "or"). It's unnecessary inside the brackets. Actually you don't need any pipes. The following works also:

^([a-zA-Z]+ )+

You don't really need that 4th step. I made a mistake on the 3rd step.

Step 3:

[^t]ddddd

(find zip code, but not address#'s, which now have a tab in front)

Replace: &t&

(tab in front and back)

It should have been:

Replace: t&t

Whenever you can isolate an entire "field," which has no tabs yet, it saves a step to put a tab on each side.

Another method:

You could alternatively isolate the state. This would work in case the address didn't have a comma. But it would also match "DR", "ST", etc.. Street abbreviations can be pretty variable, often entered sloppily.

[A-Z][A-Z]

Link to comment
Share on other sites

Hi Fenton,

I just realized that there was a mistake in your third step replace. Instead of

[color:"red"] &t& which gives you two zips, it should read [color:"blue"]

t&t , which does away with the need to use my fourth step. I suggest

instead a different fourth step to remove the excess spaces and comma.

These would be the new steps as amended. Be sure you use Grep.

Step 1

Find: ^([A-Za-z]+| )+

Replace: &t

Step 2

Find: d+ (w| )+,

Replace: &t

Step 3

Find: [^t]ddddd

Replace: [color:"blue"] t&t

Step 4

Find: [color:"blue"] ( t|,t |t )

Replace: [color:"blue"] t

Maybe cool.gif

Lee

Link to comment
Share on other sites

Hi Fenton,

I was composing my response and posted it without refreshing my screen. I would normally just delete it since it says a lot of what you have just posted. However, I'm leaving it because of my "New" step 4.

Lee

Link to comment
Share on other sites

This topic is 7557 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.