How to find and delete duplicate, triplicate, etc. records based on a single field

kcep · April 18, 2023

I have a script in my database to find and delete duplicate records based on a single field. Does anyone have a script to handle the possibility that the duplicate record occurs more just one time?

comment · April 18, 2023

Basically you have two options:

Loop over each record, deleting it if the next (or previous) record is a duplicate, otherwise moving to the next record - see an example here:
https://fmforums.com/topic/97304-how-to-delete-all-duplicates/?do=findComment&comment=442212&_rid=72594;
Use a variation of the Fast Summaries method to delete from each group the number of records that is equal to the size of the group minus one.

The 2nd method is faster, but it requires having a summary field to count the records. Both methods will be faster if you start by finding duplicates.

--
P.S. Make sure you have a backup before trying any of this.

Edited April 18, 2023 by comment

kcep · April 19, 2023

comment ... THANKS for your help. 🙂

Using a calculation with a self-joining table, I'm showing that I have 1802 records (out of 85873 total records) that have duplicate values in this field ... Email_Archive::txt_filename_import

After Showing all records (85873 records) and unsorting, I ran the script for option 1. The script finished after only a few seconds, and brought my total number of records down to 85250 (623 records were deleted). However, the new total of duplicates (per my self-joining table calculation) only went from 1802 records to 1799 records (3 duplicates were found and deleted). I must have done something wrong. 🤔

Here is my script ... (see attached screen shot)

comment · April 19, 2023

39 minutes ago, kcep said:

I must have done something wrong.

Possibly - but I am not able to tell you what, based on your description.

Can you post your file, as it is before running the script?

kcep · April 19, 2023

Thanks for your help. I'll look through things again on my end. I can't send the file ... It contains personal email addresses for thousands of contacts.

comment · April 19, 2023

One more thing: this seems to be a table populated by importing records from an external source (or sources?). Instead of importing duplicates and then hunting for them, you have the option of validating the txt_filename_import field as Unique, Validate always. Then any duplicate values will be skipped during the import/s. The import process may take a longer time - but IIUC this is a one-time conversion, so it shouldn't be a big deal.

kcep · April 20, 2023

I set the txt_filename_import field to: Unique, Validate always 🙂

I tried the import, but somehow FileMaker locked-up.

I tried the same things again ... This time my FileMaker database was on a local machine (instead of the company server), and everything seemed to work great. After the import completed, I received the following "Import Records Summary" notification (see attached screen shot). I'm pretty confident everything worked. My only concern is that the import skipped 2,439 records, and my original calculation with self-joining table showed that there were 1,802 duplicates. Do you think the difference could be the possibility that certain duplicates (2x) are actually triplicates (3x) or (4x), etc.?

Thanks!

comment · April 20, 2023

7 hours ago, kcep said:

my original calculation with self-joining table showed that there were 1,802 duplicates. Do you think the difference could be the possibility that certain duplicates (2x) are actually triplicates (3x) or (4x), etc.?

I don't know how your "original calculation" works, so any opinion on that would be purely guessing.

If you still have a file with the original imported set, you can see how many unique values it contains by looking at the result of:

ExecuteSQL ( "SELECT COUNT (DISTINCT txt_filename_import) FROM Email_Archive" ; "" ; "" )

Note that the SQL result can be different from the result produced natively in Filemaker, since SQL is always case-sensitive while FM may consider "ABC" and "abc" to be duplicates (depending on the language selected to index the field). If you want SQL to ignore the case, you can modify the query to use LOWER(txt_filename_import) instead of txt_filename_import.

--
P.S. I am still bothered by the fact that running the de-duping script did not reduce your total record count by the same number of 2,439 records.

Edited April 20, 2023 by comment

kcep · April 20, 2023

comment,

THANKS for you help on this solution! 🙂

Sign In

How to find and delete duplicate, triplicate, etc. records based on a single field

Recommended Posts

kcep

Link to comment

Share on other sites

comment

Link to comment

Share on other sites

kcep

Link to comment

Share on other sites

comment

Link to comment

Share on other sites

kcep

Link to comment

Share on other sites

comment

Link to comment

Share on other sites

kcep

Link to comment

Share on other sites

comment

Link to comment

Share on other sites

kcep

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information