backsmith Posted April 18, 2016 Posted April 18, 2016 Hi I am cleaning up a messy list of doctors names and addresses. I have got a lovely clean and accurate its in a new table. The old table is very unreliable, with inaccurate placement of the data for address1,address2,town, post code etc. This means post code could be found in any of the fields or not at all. My idea is to break out the search terms by "exploding" all the words in all the address fields and using these to search in the clean file. I want to score the results so more finds gets a better score. I can then use this as a threshold to match the records and then store a foreign key. Finally I will replace the links to the messy data from the main application with the key to the clean data. TO ILLUSTRATE Address1 Address2 Town Post Code Messy file 20 High Street Birmingham Clean 20 High Road Birmingham B1 1AA Score 1 +1 +1 =3 Questions Is this a sensible approach? Or is there a better way? I can generate a multiple line search but this will find every High in Birmingham. I think I need to loop into repeated searches and use this to count the results ? Any guidance gladly received. Thanks John
Lee Smith Posted April 18, 2016 Posted April 18, 2016 3 hours ago, backsmith said: Messy file 20 High Street Birmingham This all in one field? Where is the zip code in this one? Is the data currently in a external file? i.e. text editor?
backsmith Posted April 19, 2016 Author Posted April 19, 2016 Hi Lee Not all in one field. Fields are Address1,address2,Town,Post Code. Im in the UK so no zip. We have a 7 character post code like XY12 2AB. Not all my messy data has the code, so I can't do a straight match on post code. Both Tables are in Filemaker and also in excel. Could be in a text editor if that helps. Simple enough to put the excel out in CSV. Thanks John
Lee Smith Posted April 19, 2016 Posted April 19, 2016 Hi John, The big question is, how many records are messed up? If it’s just a few records, then I would use a Text editor or Excel to push the data around. 45 minutes ago, backsmith said: Both Tables are in Filemaker and also in excel. Could be in a text editor if that helps. Simple enough to put the excel out in CSV. If there are a lot, then I would look at using TextWrangler with it’s ability to use Grep Patterns and Regular Expressions. As for the Zip Codes, create a lookup table of zip codes based on the Address and City. Lee
Recommended Posts
This topic is 3476 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now