Jump to content

Identifying "duplicate" contacts


This topic is 7227 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Apologies in advance if this is not the right forum - please recommend a better one B)-)

This must come up fairly often: how do you identify "duplicates" in a file of people containing name, address, city, state, zip, and maybe telephone #? My definition of "duplicate" is inherently loose - it tolerates a certain amount of discrepancy due to abbreviations, typos, and omitted info, as well as discrepancies due to a change of address. Thus, any two records would have a certain *probability* of referring to the same individual. A human would make the final call, but I want something that would greatly narrow the field beforehand.

This sort of logic would seem to be well beyond the ability of FM scripting to implement, so I thought there might be a plug-in that did this kind of task.

Thanks,

Chap

Link to comment
Share on other sites

You can create a calculated field that aggregates all of the criteria you specify for possible dupes. Say you want to look for all records that have the same first and last name. Create a calc field that concatenates these fields ( or any others you want), then create a self join based on that field (say you named the relationship "dupeFinder" as below). Another calc field would then be used to identify the records:

Case((IsEmpty(FirstName) or IsEmpty(LastName));"Empty";

Case(

Count(dupeFinder::FullName) = 1; "Unique";

Case(

GetAsNumber(contactID) = GetAsNumber(dupeFinder::contactID); "Original"; "Duplicate"

)

)

)

If the count function returns the value "1", that means there is only one value with a match (the record itself) Otherwise it compares the unique ID field of the current record with the first one in the the related set. If it matches, then the first record created must equal the current record, thus it's the original one, and all others will be flagged duplicate by default.

To relax or constrain your criteria, just change how your calc field concatenates the fields of interest.

(Like if you wanted to use just the first initial and the last name)

Hope this helps.

Dana

Link to comment
Share on other sites

Thanks, Dana

It turns out there's been a sizeable amount of research done on the subject of "approximate string matching". There are some pretty complex algorithms that can assign a probable match rating that tolerates a certain amount of discrepancy such as letter inversions, substitutions, and omissions. It would be interesting to start with keys such as you describe, and then do "approximate matching" to expand the matching set to include minor typos & abbreviations.

Maybe there's a product here! B)-)

Chap

Link to comment
Share on other sites

This topic is 7227 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.