A simple duplicate remover

John Fowler · June 3, 2005

I recently had to scrub a 10,000 record customer database of duplicates. Thinking it would be hard to do, I went to all the various Filemaker discussion forums and looked for a ready-made solution. But every one I encountered was baroque and difficult to transfer to my db.

So, I took all of 2 minutes to write a simple duplicate remover. When I was done, it scrubbed all 10,000 records in under 5 seconds. I just thought anyone else looking for any easy way rather than a hard way might appreciate this. Who knows, there may be other simple dup removers listed around here, but I didn't find any:

--Create a self-join relationship with the relevant match field (in my case, it was Customer Email).

--Create a calc field ("Related record count") that counts the number of records in the relationship. Any unique records will show 1 in this field, records with duplicates will show 2+.

--Run this simple script:

Find (records where Related record count >1)

Go to first record

Loop

If Related record count >1

Delete record (no dialog)

End If

Go to next record (exit after last)

End Loop

To delete the oldest duplicates, start from the first record in the found set. To delete the most recent duplicates, start from the last record. As the dups are deleted, the Related record count drops to 1, and the last record survives deletion by virtue of the If.

It took me 2 minutes to write, 5 seconds to execute. If you can use this, please do!

John

spb · June 3, 2005

You can run into problems deleting records in the middle of a loop. After deletion, when you go to next record, it may no longer be the "next" record you think it is. Though your record-count calc is pretty nice. It still looks like it may be possible to skip records. When the loop deletes a record, then the next record in line becomes active. The script goes to next record, meaning that active record was never checked.

A simple dupe find (enter an exclamation point in your match field) will find them in a jiffy, no need for self-join or calc field. Then sort by match field to group dupes together.

The standard script for this is one that was given as the example script in the Scdriptmaker chapter in the FM 5 manual. I've adapted this for many things.

In essence, you make a global field to use as a variable. Go to the first record & store the match field in the global. Go to next record & compare match field to global. If it matches, you have a dupe (the second record with the same match field). Set a boolean flag field.

If the compare does not match, set the global to the new match field and go to next record. Do not set the boolean flag.

After the loop runs, all but one of each set of dupes will have the boolean flag set. If you have multiple dupes, four or five of each, say, then only one of the group will end up unflagged.

Now run a find on the flag field and delete the found set. You can include the find and delete in your looping script, but this is dangerous. Better to choose to run it and let the scrtipt simply mark what needs deletion.

Steve Brown

-Queue- · June 3, 2005

Note that you can move the Go to Record/Request/Page [Exit after last; Next] step into an Else portion of the If statement and you will not be skipping records.

Sign In

A simple duplicate remover

Recommended Posts

John Fowler

spb

-Queue-

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information