Find duplicates between multiple DBs

BFH · July 31, 2002

Does anyone have a script that compares an ID field in two or more open databases (unrelated) and flags those records that have a common ID value between the databases?

In other words, Mary Smith has a record in Database 1 and in Database 2. I would want to put an X in a "flag" field in her record in both databases. Susie Jones has a record only in Database 1. I would not want to put an X in the "flag" field in her record.

Thanks in advance.

Vaughan · August 1, 2002

Believe it or not, what you describe is a relationship between the databases! However, DO NOT rely on people's names as an indication of duplication: you'll very quickly be disappointed.

Which brings us back to the whole reason for having one "people" file and relating it to all the others...

BFH · August 1, 2002

Thanks for taking time to reply.

Clarifications: 1) I simply used people's names to keep it simple, and actually the databases have ID numbers for each person who has a record; 2) I do not want to rely on a relationship because the task of flagging duplicate records is just the first of several steps.

The ultimate goal is to take nine separate databases and merge them into a single database that contains a single record per person.

Do you have a script that would accomplish what I inquired about in my original post (i.e., compare the values in the ID fields of two separate FMP5 files and put an "X" in a "dup" or "flag" field in both files for each record that the ID value is present in both files)?

Hope the above is clearer than mud!

Cheers!

Vaughan · August 2, 2002

I've done something similar. I'll repeat again (because it's really not trivial) the hardest part is working out which records are duplicates! Spelling typos, variations on names and addresses all conspire to make the job hard, yet two people with the same name might not be duplicates! You might have to include birth dates (or some other data) to get a clearer picture.

Once you have worked out your duplicate rules, I simply imported all records into one file (adding necessary fields so all data had a place to go) and made a calc field that concatenated all the necessary text fields together to make the comparison string.

I then created a script that searched for duplicate records (perform find for "!" in the comparison field), found the first duplicated set and deleted all but the first duplicate record. Loop through the process (search for dups in comparison field, find first dup set, delete extras) until no more dups are found.

Hints: assume that some duplicate records will remain, and that some unique records might get deleted due to inadequacies in the algorithm and junk data entry; back everything up first; don't do it on a production system!

BFH · August 5, 2002

Thanks for replying and for the helpful suggestions.

We do have a unique ID number for each person in the databases and will not be relying on text fields (i.e., names) for finding duplicate records.

Fortunately, the databases have relatively small numbers of records allowing for double-checking by visual inspection.

Cheers!

Rigsby · August 6, 2002

I understand what you want to do, and at least you have a unique ID so finding the doubles will be a breeze. However, the thing is to decide how you want to merge all the information from other files into one file. You could simply import all the records into one database and then look for doubles, but I would guess that you have different information in different files, so really want to merge this information into one record. To achieve this I would create relationships to the other files and then use scripts to grab the information from related records filling out missing fields or adding the content of fields in a related record to the content of the fields in your main database. If you keep careful track of the relationships you can delete them once you have finished. You could do this without relationships, i.e. copying the unique ID in file 1, going to file 2 and performing a find but the scripting for this would be very lengthy and complicated in comparison to using relationships.

BFH · August 7, 2002

Thanks for taking time to reply and for the helpful suggestions.

The process is a bit more complicated; I've intentionally simplified it in this thread.

I have only two of the databases. It's unclear when the others will be provided.

My experience with FMP is limited to end-user. I can design data-entry screens and reports, but my scripting skills are weak at best.

The time it would take me to develop functional, reliable scripts most likely would exceed the time it would take me to go through the databases manually; they have fewer than 100 records each.

Yes, I do want to migrate to a single record per student. All of the databases have some common fields (e.g., ID number) and each has some unique fields.

My goal is to set up a single database, with multiple tables (e.g., one for demographic data, one for achievement data, etc.), that contains a single record per student.

Cheers!

Sign In

Find duplicates between multiple DBs

Recommended Posts

BFH

Vaughan

BFH

Vaughan

BFH

Rigsby

BFH

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information