Finding duplicates

Len Roberto · December 8, 2005

I have used the following terrific and easy way to find dupes in a field- (like email addresses) for years in Version 4.0:

1. Define a self-join relationship with 'PhoneNumber' as the key field for both the main file and the related file. A self-join relationship is one in which the main file and related file are the same file; thus, you would simply choose the same file when you are prompted to specify the related file. Name your relationship SelfJoin.

2. Define two fields:

Counter (Text, Auto-enter serial number)

CheckDuplicates (Calculation, Text result)=

If(Counter = SelfJoin::Counter, 'Unique', 'Duplicate')

3. Find All records, click into the new Counter field, and perform a Replace, replacing with a Serial number. This will assign a serial number value to all existing records in your database; records entered in the future will automatically have serial numbers entered.

4. The first record in any series of duplicates will now hold the value'Unique' in the CheckDuplicates field, and all duplicate records within the same series will be marked 'Duplicate'. At this point all duplicate records can be found and omitted or deleted if desired. Note that records with nothing in the key field (PhoneNumber) will also be flagged as duplicates.

Once set up as above, this system will mark duplicate records automatically as they are created; simply perform a Find for 'Duplicate' in the Counter field.

But in 7.0 - it does not seem to work- it finds no dupes...anyone have any insight and is there an easy way to find dupes in a field? I have no experience with scripts and have always used the method above...any help appreciated!

-Len

Raybaudi · December 8, 2005

Hi

this is pretty easy !

Make this script:

Enter Find Mode [ ]

Set field [PhoneNumber ; "!" ]

Perform Find [ ]

This don't need any relationship ! :P

LaRetta · December 9, 2005

Except that using ! produces ALL duplicates and does not isolate the first (or original) from the others. In fact, I think it's one of FM's biggest Chindōgus. It offers great promise but produces results which are almost worthless. Because you would still need to sort, then loop and identify which is first etc. I'm unsure why your example didn't work, Len, maybe spaces or mis-match in phone field? But this works - I use it all the time in 7:

1. Define a self-join relationship with 'PhoneNumber' as the key field for both the main file and the related file.

Correct. Create another table occurrence of your main file (maybe called Duplicates). If you're joining phone number = selfJoin::phone number, you may wish to massage the data so odds of match are greater. If two numbers are the same but one uses parentheses and one uses dashes, they won't match. Even an errant comma (or space?) in the middle will mess it up. But if you instead use: Filter ( PhoneField ; "0123456789") text calculation as your join, odds are greater they'll match because everything else will be stripped.

2. If you don't have a serial (number) then create a calculation (number) called cRecordID with: Get(RecordID). Then create a number calculation:

not selfJoin::cRecordID or cRecordID = selfJoin::cRecordID

BTW, I use numbers whenever possible because I believe they are quicker (maybe not, but that's what I believe). I believe this applies to searching as well. If you want the first related record (first in its natural sort state of creation order) to flag with 1 and the other duplicates to flag 0 this is all you need. This boolean calc will produce a 1 on the first (because it is the first related record). All other duplicates will produce a 0 and blank phone numbers will produce a 1 (you don't want to delete all records with blank numbers, do you? So you don't want them treated as duplicates). If the record has no duplicates, it will also produce a 1. Then just search for 0 and delete (or do whatever you wish with them.) If you wish to keep the most recent, just descend-sort your relationship on cRecordID so the last record becomes the first in the relationship. In fact, the relationship sort alone will determine which record will be flagged as 'first'. Just sort so your 'keep' record pops to first in relationship. :wink2:

LaRetta

Raybaudi · December 9, 2005

Hi LaRetta

not selfJoin::cRecordID or cRecordID = selfJoin::cRecordID

interestin calc...

So, if I want a "0" to the blank fields too, the calc will be:

selfJoin::cRecordID and cRecordID = selfJoin::cRecordID

comment · December 9, 2005

> selfJoin::cRecordID and cRecordID = selfJoin::cRecordID

If the second condition is true - can the first condition be false?

Raybaudi · December 9, 2005

Hi comment

so the calc will be:

cRecordID = selfJoin::cRecordID

darcyb · February 8, 2006

I'm jumping into FileMaker 8 from a MySQL world.

When you import records and there's a unique field, does FM import one and ignore the rest that are the same? Does it work like an IGNOREDUPLICATES directive in MySQL?

Is there an equivalent "SELECT DISTINCT" ability in FM?

Sign In

Finding duplicates

Recommended Posts

Len Roberto

Raybaudi

LaRetta

Raybaudi

comment

Raybaudi

darcyb

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information