Jump to content

Fuzzy match two tables


backsmith
 Share

This topic is 2263 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi

I am cleaning up a messy list of doctors names and addresses.  I have got a lovely clean and accurate its in a new table.  The old table is very unreliable, with inaccurate placement of the data for address1,address2,town, post code etc.  This means post code could be found in any of the fields or not at all.

My idea is to break out the search terms by "exploding" all the words in all the address fields and using these to search in the clean file.  I want to score the results so more finds gets a better score.  I can then use this as a threshold to match the records and then store a foreign key.  Finally I will replace the links to the messy data from the main application with the key  to the clean data.

TO ILLUSTRATE

                      Address1             Address2        Town             Post Code  

Messy file      20 High Street     Birmingham            

Clean             20 High Road                             Birmingham       B1 1AA

 

Score               1  +1                                           +1     =3

 

Questions

Is this a sensible approach? Or is there a better way?

I can generate a multiple line search  but this will find every High in Birmingham. 

I think I need to loop into repeated searches and use this to count the results ?

Any guidance gladly received.

Thanks

 

John

 

Link to comment
Share on other sites

3 hours ago, backsmith said:

Messy file      20 High Street     Birmingham

This all in one field? Where is the zip code in this one?

Is the data currently in a external file? i.e. text editor?

Link to comment
Share on other sites

Hi Lee

Not all in one field.  Fields are Address1,address2,Town,Post Code.  Im in the UK so no zip.  We have a 7 character post code like  XY12 2AB.  Not all my messy data has the code, so I can't do a straight match on post code.    Both Tables are in Filemaker and also in excel.  Could be in a text editor if that helps.  Simple enough to put the excel out in CSV.  

 

Thanks

 

John

Link to comment
Share on other sites

Hi John,

The big question is, how many records are messed up?

If it’s just a few records, then I would use a Text editor or Excel to push the data around.

45 minutes ago, backsmith said:

Both Tables are in Filemaker and also in excel.  Could be in a text editor if that helps.  Simple enough to put the excel out in CSV.  

If there are a lot, then I would look at using TextWrangler with it’s ability to use Grep Patterns and Regular Expressions.

As for the Zip Codes, create a lookup table of zip codes based on the Address and City.

Lee

Link to comment
Share on other sites

This topic is 2263 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.