Jump to content
Server Maintenance This Week. ×

How to find and delete duplicate, triplicate, etc. records based on a single field


This topic is 373 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have a script in my database to find and delete duplicate records based on a single field. Does anyone have a script to handle the possibility that the duplicate record occurs more just one time?

Link to comment
Share on other sites

Basically you have two options:

  1. Loop over each record, deleting it if the next (or previous) record is a duplicate, otherwise moving to the next record - see an example here:
    https://fmforums.com/topic/97304-how-to-delete-all-duplicates/?do=findComment&comment=442212&_rid=72594;
  2. Use a variation of the Fast Summaries method to delete from each group the number of records that is equal to the size of the group minus one.

The 2nd method is faster, but it requires having a summary field to count the records. Both methods will be faster if you start by finding duplicates.

--
P.S. Make sure you have a backup before trying any of this.

 

Edited by comment
  • Like 1
Link to comment
Share on other sites

comment ... THANKS for your help.  🙂

Using a calculation with a self-joining table, I'm showing that I have 1802 records (out of 85873 total records) that have duplicate values in this field ... Email_Archive::txt_filename_import

After Showing all records (85873 records) and unsorting, I ran the script for option 1. The script finished after only a few seconds, and brought my total number of records down to 85250 (623 records were deleted). However, the new total of duplicates (per my self-joining table calculation) only went from 1802 records to 1799 records (3 duplicates were found and deleted).  I must have done something wrong. 🤔

Here is my script ... (see attached screen shot) 

ScreenShot2023-04-19at9_19_10AM.png.dadc9cf5b07e29ad12d94584b4aa2ad2.png

Link to comment
Share on other sites

Thanks for your help. I'll look through things again on my end. I can't send the file ... It contains personal email addresses for thousands of contacts.

Link to comment
Share on other sites

One more thing: this seems to be a table populated by importing records from an external source (or sources?). Instead of importing duplicates and then hunting for them, you have the option of validating the txt_filename_import field as Unique, Validate always. Then any duplicate values will be skipped during the import/s. The import process may take a longer time - but IIUC this is a one-time conversion, so it shouldn't be a big deal.

 

  • Like 1
Link to comment
Share on other sites

I set the txt_filename_import field to: Unique, Validate always 🙂

I tried the import, but somehow FileMaker locked-up.

I tried the same things again ... This time my FileMaker database was on a local machine (instead of the company server), and everything seemed to work great. After the import completed, I received the following "Import Records Summary" notification (see attached screen shot). I'm pretty confident everything worked. My only concern is that the import skipped 2,439 records, and my original calculation with self-joining table showed that there were 1,802 duplicates. Do you think the difference could be the possibility that certain duplicates (2x) are actually triplicates (3x) or (4x), etc.?

Thanks!

image.png.9dd9c0f1ebe8b1d7bfc2e5db2704439a.png

 

Link to comment
Share on other sites

7 hours ago, kcep said:

my original calculation with self-joining table showed that there were 1,802 duplicates. Do you think the difference could be the possibility that certain duplicates (2x) are actually triplicates (3x) or (4x), etc.?

I don't know how your "original calculation" works, so any opinion on that would be purely guessing.

If you still have a file with the original imported set, you can see how many unique values it contains by looking at the result of:

ExecuteSQL ( "SELECT COUNT (DISTINCT txt_filename_import) FROM Email_Archive" ; "" ; "" )

Note that the SQL result can be different from the result produced natively in Filemaker, since SQL is always case-sensitive while FM may consider "ABC" and "abc" to be duplicates (depending on the language selected to index the field). If you want SQL to ignore the case, you can modify the query to use LOWER(txt_filename_import)  instead of txt_filename_import.

--
P.S. I am still bothered by the fact that running the de-duping script did not reduce your total record count by the same number of 2,439 records.

 

Edited by comment
  • Like 1
Link to comment
Share on other sites

This topic is 373 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.