Jumbled Name field

Roger1 · January 21, 2001

This must be a classic Data Base problem.

I have inherited 13,000 records of very useful data and have imported it into FMPro. The original data was transaction data and not meant to be information managed, so it lacks consistency with regard to the way names were entered.

I want to search and sub total and sort by name so the name field is a stumbling block.

One person may have 20 records and across those 20 records their name could be presented in 5 or 6 different ways. Eg. Fred Bloggs, then F Bloggs or F. Bloggs, Bloggs Fred, Fred & Jenny Bloggs, Fred A. Bloggs, FA & J Bloggs etc etc etc. Without re entering all 13,000 records (which I must admit I thought of (but only fleetingly!) I love to write a script to unjumble the variations and simply have Fred Bloggs in all of the 'Bloggs' records.

I could then search, sort and total to my heart's content.

I think it's got something to do with a script searching for text strings...

Can anyone give me some clues? Thanks.

Chuck · January 21, 2001

To fully automate this process is impossible. If there are records for F. Bloggs, Frank Bloggs and Fred Bloggs, how would FileMaker know which Bloggs to assign F. Bloggs to?

You can automate much of it, however, so long as you flag records which FileMaker wasn't able to fix. You might create a self-join relationship based on the Last Name (if it's not in its own field, there's another complication, since RightWords (Name, 1) will work in most cases, but not all (such as John Smith, Jr.)). This would allow you to loop through records with the same last name as the record you are on and count those that have the same first initial (using Left( SelfJoin::Name, 1 )). If all of those related records have the same initial or the same first name, then you can probably safely assign all of the records that match the last name and the first initial or first name to the same person. But if there are some people with the same last name that have entirely different first names, you would probably have to flag these for manual fixing.

It's going to be complicated. It might even take longer to write the script then it would to fix the data manually. To do it manually, you might sort by last name, first name, and scroll through the data looking at similar blocks (those that have the same first initial would be grouped together) and using find and repalce to easily fix the data.

Chuck

john.daly · January 21, 2001

I think I would resort to the manual approach but simply using a Find request. If you set the Find request to Bloggs this should find all records with Bloggs in the name field. Check that they are all the same person and then use the Replace command to standardise the name in all records.

The only downside of this is that it will also find Bloggson but you could omit these records and then treat them using a new find request for Bloggson.

Best of luck!!

BobWeaver · January 22, 2001

Whether you use the find or the sort approach depends on how many unique names are in the database. If there are 6500 names and only two records per name, then the sort approach will work better.

My approach in these situations is to try to write a script that will do about 80% of the work automatically. That seems to be where you start to get diminishing returns labour-wise. I would sort the file by as many criteria as possible that will tend to bring like records together. I would use a sort order of: city, phone number, last name, first initial, first name--whatever you have available, of course. As Chuck says, you can make a few calculated fields to extract the initial of the first name, adjust upper/lower case and otherwise normalize some of the fields. At that stage you may see some patterns emerge that will help you write a script. The goal should be to eventually give each unique person in the file a unique ID number rather than just change the name entries to agree.

I would make a script that loops through each record in the sorted file and compares a number of fields between the current record and the previous record. If everything matches exactly, assign the same ID number as the previous record. If not enough fields match, then increment the ID number. After you have done this, you can print out a comparatively small summary report which you can look through and then manually correct any mistakes.

As I'm terribly lazy and easily bored, I would rather spend the time to automate as much as possible, and minimize the time spent tediously checking individual records. That's just a personal preference.

[This message has been edited by BobWeaver (edited January 21, 2001).]

Roger1 · January 24, 2001

quote:

Originally posted by john.daly:

I think I would resort to the manual approach but simply using a Find request. If you set the Find request to Bloggs this should find all records with Bloggs in the name field. Check that they are all the same person and then use the Replace command to standardise the name in all records.

The only downside of this is that it will also find Bloggson but you could omit these records and then treat them using a new find request for Bloggson.

Best of luck!!

Thanks John. I retreated to a laborious manual approach and used as many tricks as I could think of to clean it up quickly. I am almost there.

Roger1 · January 24, 2001

quote:

Originally posted by BobWeaver:

Whether you use the find or the sort approach depends on how many unique names are in the database. If there are 6500 names and only two records per name, then the sort approach will work better.

My approach in these situations is to try to write a script that will do about 80% of the work automatically. That seems to be where you start to get diminishing returns labour-wise. I would sort the file by as many criteria as possible that will tend to bring like records together. I would use a sort order of: city, phone number, last name, first initial, first name--whatever you have available, of course. As Chuck says, you can make a few calculated fields to extract the initial of the first name, adjust upper/lower case and otherwise normalize some of the fields. At that stage you may see some patterns emerge that will help you write a script. The goal should be to eventually give each unique person in the file a unique ID number rather than just change the name entries to agree.

I would make a script that loops through each record in the sorted file and compares a number of fields between the current record and the previous record. If everything matches exactly, assign the same ID number as the previous record. If not enough fields match, then increment the ID number. After you have done this, you can print out a comparatively small summary report which you can look through and then manually correct any mistakes.

As I'm terribly lazy and easily bored, I would rather spend the time to automate as much as possible, and minimize the time spent tediously checking individual records. That's just a personal preference.

[This message has been edited by BobWeaver (edited January 21, 2001).]

Thanks Bob. The looping idea is one that I have not explored before and I will do some research to determine correct application. There are no doubt many opportunities to gain good use from it.

I did manage to find another heap of data that had a serial number that was common to my problem database. I created the 2nd database and linked the 2 using the serial no to create a unique ID, which then allowed me, if there was an ID match between the 2, to suck separate first name and surname fields into the problem dbase. I then could group and sort based on the first and surname. This automatically took care of about 9,000 records leaving a poultry 2,000 for me to fumble through.

The tragedy is that it looks like it may be an annual job as the year's figures update and I have to do it all over again!

Thanks for your help.

Roger1 · January 24, 2001

quote:

Originally posted by Chuck:

To fully automate this process is impossible. If there are records for F. Bloggs, Frank Bloggs and Fred Bloggs, how would FileMaker know which Bloggs to assign F. Bloggs to?

You can automate much of it, however, so long as you flag records which FileMaker wasn't able to fix. You might create a self-join relationship based on the Last Name (if it's not in its own field, there's another complication, since RightWords (Name, 1) will work in most cases, but not all (such as John Smith, Jr.)). This would allow you to loop through records with the same last name as the record you are on and count those that have the same first initial (using Left( SelfJoin::Name, 1 )). If all of those related records have the same initial or the same first name, then you can probably safely assign all of the records that match the last name and the first initial or first name to the same person. But if there are some people with the same last name that have entirely different first names, you would probably have to flag these for manual fixing.

It's going to be complicated. It might even take longer to write the script then it would to fix the data manually. To do it manually, you might sort by last name, first name, and scroll through the data looking at similar blocks (those that have the same first initial would be grouped together) and using find and repalce to easily fix the data.

Chuck

Thanks Chuck, no doubt you have seen my other responses. I've saved your reply to explore some of your suggestions and realise that I need to increase my knowledge of the functions and formulas that really make FMPro sing.

Sign In

Jumbled Name field

Recommended Posts

Roger1

Chuck

john.daly

BobWeaver

Roger1

Roger1

Roger1

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information