Jump to content

Clean up imported Street Addresses


This topic is 7314 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I'm setting up a FM7 solution to replace an SQL db.

Customer street addresses have been entered as:

"145 River Road"

I'd like to modify this imported data to split it into two fields"

Street Num and Street Name

eg "145" and "River Road"

Naturally there will be a wide variety of prefixes in front of the street name, including street numbers, apartment letters, perhaps no numbers.

Can anyone suggest an automated solution for cleaning this up ?

One idea is to convert the last two words in the field to "Street Name" and assume remaining characters refer to Street Num. This will provide incorrect data where street has double barrelled name eg "69 Rose Banks Drive"

All suggestions welcome.

I also use OS X, so an AppleScript solution is acceptable.

This is a once only task, all new data will be entered correctly.

thanks

shane

FileMaker Version: 7

Platform: Windows XP

Link to comment
Share on other sites

Hi Shane,

There are some tricks that may get you a little closer to what you want, but the plain fact is that there is an infinite number of different ways that folks write their addresses and no matter how many you think of and cater for, someone will always come up with something that falls outside the square. So you'll still need a process to check for errors and fix addresses that don't parse well.

However, that said, based on the examples you've provided, I suggest that as a starting point, you try:

Get As Number(LeftWords(Address; 1))

and

RightWords(Address, WordCount(Address) - 1)

...and see if that successfully parses all your addresses. To check for errors, I suggest that you initially do two things:

1. Search for records with a blank Street Num field - if there are any they will be from addresses that don't have a number up front, and you'll have to decide how to handle them as a separate case.

2. Create a temporary calc field with the formula GetAsNumber(Street Name) and search for non-empty field in the calc, then check what the source of the numeral ocurring there is.

Simple is best, so if this works for a majority of cases then you will be in business. However bear in mind that some addresses may have more than one number (eg flat numbers, floor numbers, shop numbers etc) and if your addresses hold any of these combinations, youy may need to consider a rather more complex parsing methodology... wink.gif

Link to comment
Share on other sites

Sometimes "grep" is the best tool for recognizing patterns, esp. on large sets of data. It's more or less cross-platform. I use BBEdit (or TextWrangler, $50) on Mac, http://www.barebones.com, and there's EditPad Pro on PC's.

It's also possible to use AppleScript directly with FileMaker fields, using Perform AppleScript. Though, for one-shot deals on large text, or for testing, it's best to use a real text editor, so you can see what's happening (with a good Undo :-)

You need a Scripting Addition to enable grep in AppleScript (unless you're a UNIX geek who can use it directly; it is built-in to OS X).

I use a free one from Satimage Software, who also release the free AppleScript editor Smile.

http://www.satimage.fr/software/en/downloads_software.html

It has a find/replace with grep (regexp).

It's easier than it sounds. There is documentation in many places for grep, though it has a few minor variations, Perl, etc.. I only know and use the basics.

^[^A-z]+

will find the numbers in:

2344 49 1/2 Street S

But it will also find in:

2344 49 5th Street S

returning:

2344 49 5

Oops.

Then there's also:

^(P O Box|P. O. Box|P.O. Box) [^A-z]+

or, more simply

^(P O Box|P. O. Box|P.O. Box) d+

for the many variations of "P O Box 15455"

BTW, "^" has a dual function. Outside of brackets it means "starts the line." Inside it means "not." The dash inside is kind of a "range" glue. The pipe "|" means "or." "d" is "digit."

There are also special characters for Replace. Basically "&" means "whatever last matched the pattern. In all cases above, you'd just use "&t" to replace with whatever matched and a tab, basically adding a tab.

You can also use "(pattern)" to define a pattern, which can then be called by number during the replace, using n

Find:

(pattern1)(pattern2)

Replace:

21

will just switch (tweedle) the two strings found.

Fun, huh?

Link to comment
Share on other sites

This topic is 7314 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.