Removing Duplicates

fmresu · September 11, 2005

Is there a simple way to delete duplicates? Is there some sort of simple script to do this? I would have thought that this was such a basic function in Filemaker that they would have included a script in the programme. I can't tell you how frustrating this has been - pretty obvious stuff but apparently overlooked.

kiwiora · September 11, 2005

1. sort the records based on the field (in this example I will call it ABC) which has the duplicate data in

2. go to first record

3. set a global field (gABC) with the data of ABC field which is duplicated

set field(gABC:ABC)

4. go to next record

LOOP

5. IF gABC = ABC

delete record;

exit loop if Get ( FoundCount ) = Get ( RecordNumber )

ELSE

6. go to next record

7. set field(gABC:ABC)

8. go to record next (exit after last)

end if

end loop

else exit script

end if

rayd8 · September 11, 2005

You didn't give much information on your database but wouldn't you want to consider referential integrity before deleting duplicate rows? If your database was designed properly it would have unique identifiers for each record (primary key or foreign key). Say you have this scenario:

ID FirstName LastName

123 John Smith

456 John Smith

Now what if you have records that contain these identifiers? If you have another table that has invoice records and some contain the ID for 123 John Smith BUT others contain the ID for 456 John Smith. If you delete one of the John Smith records then some of your invoices won't belong to anyone. In this case you would have to create a cascading delete and also delete all the other records that relate to the record being deleted.

This is probably why Filemaker hasn't created a function for deleting duplicates. There is a lot to consider before you just go deleting records or your database will start exhibiting strange behavior.

Am I evaluating this situation correctly?

janedoe · September 12, 2005

hi,

you can find a way of doing this in the filemaker help too. look for "finding duplicates using a self-join relationship".

cheers,

jd.

GrantSymon · September 23, 2005

Hi,

I've tried the method in the FM Help file and I can't get it to work ... I can't understand why it should either (which is obviously my own problem). I'm trying to do this in FMPro 8

The Help file isn't very clear about the Relationship. It says to use the same field on both sides. Then when I follow the link to find out more about self-join relationships, it uses 2 different fields in 2 occurances of the same table.

Here's a snippet of the Help file

3. Define a self-join relationship.

Use your chosen identifying field as the match field in both tables in the relationship.

(Okay ... seems straight-forward. 2 Occurances, same field selected in both)

4. Define two fields:

Counter, a text field with an auto-entered serial number (select Serial number and accept the default values for Next and Increment by).

(Yup ... done that)

Check Duplicates, a calculation field with a text result, with the formula:

If(Counter = table1::Counter, "Unique", "Duplicate")

(Hmmm .... so in FMPro 8 I have *2* choices in calc-options for choosing *where* the files are located. First is "Evaluate in the Context Of :" Second is : which of the relationships you choose from. The Help file isn't 'helpful' about this at all.

7. Perform a find for Duplicate in the Check Duplicates field.

The first record in any series of duplicates now holds the value "Unique" in the Check Duplicates field, and all duplicate records within the same series are marked "Duplicate".

So ... I've worked throught the file and it doesn't work ... which seems obvious to me ... which is probably why it's not working. I'm doing something wrong.

It seems to me that having the same field selected in both sides of the relationship will return ALL records, since all records obviously match themselves. Secondly, having a calc in which a 'unique' field = itself, seems just as 'circular' to me. Of course it equals itself. How can it not??

Could someone please explain this to me? It's really hurting my head.

(Also ... if this is indeed possible, as I suppose it must be, then what do I need to do to get the 'duplicate' records to show up in a portal?)

Many thanks for all/any help.

Grant[color:blue][color:blue]

Sanjai · September 23, 2005

You have missed a step after defining the counter field with autoenter values. After this step, you have to do show all records and then use the replace menu to assign the serial numbers to the counter field.

After this step you can perform a find on the calculation field to show the duplicates.

GrantSymon · September 23, 2005

I did that step. I just didn't write about it ... the post seemed long enough already.

Any ideas why this isn't working or even any ideas about why it should work?

Sanjai · September 23, 2005

I would suggest, follow all the steps carefully again. Have you created the self join relationship using a primary key or using the counter field? If should be the primary key. You can also try Kiwiora's (second posting).

About, why it should work?

The logic in the help is when you create the count field, you assign the field serial key values. You create a self join relationship on the primary key field.

Note: When you create a relationship and you try to fetch any related value based on the relationship, the relationship would bring the values from the very first record it matches in the related file foundset, no matter there are 10 matching records.

You define the calculation field, If(Counter = table1::Counter, "Unique", "Duplicate")

Let us think of an example:

suppose you have a file which stores employee information and the primary key is SocialSecNumber.

You will create a self join relationship based on socialsecnumber. Remember each record in this table will have its own value for the counter field.

Assuming there are four records with the following socialsecnumber.

1. xxx-xxx-xxx1

2. xxx-xxx-xxx2

3. xxx-xxx-xxx1

4. xxx-xxx-xxx2

When you will assign the values in the counter field, it would have four values: 1,2,3,4.

Comes the relationship part:

1st record, xxx-xxx-xxx1 will match the first found xxx-xxx-xxx1 and since the counter values for both the main field and related field is 1, it will give Unique.

2nd record, xxx-xxx-xxx2 will match the first found xxx-xxx-xxx2 and since the counter field for both the main field and related field is 2, it will give Unique,

3rd record, xxx-xxx-xxx1 will match the first found xxx-xxx-xxx1(which would be the first record and not the third record). The main field's counter field value is 3. On the contrary, the related field's counter value is 1.

It will give Duplicate.

4th record, xxx-xxx-xxx2 will match the first found xxx-xxx-xxx2(which would be the second record and not the fourth record). The main field's counter field value is 4. On the contrary, the related field's counter value is 2.

It will give Duplicate.

Let me know if it is not clear and I would try to clear your doubts if any.

GrantSymon · September 25, 2005

1st record, xxx-xxx-xxx1 will match the first found xxx-xxx-xxx1 and since the counter values for both the main field and related field is 1, it will give Unique.

Hi Sanjai,

perhaps this is where it's going wrong.

I've worked throught he Help file again and it definitely isn't working. There is no mention of any relationship concerning the 'counter' field. Have they missed this step out of the Help file? They only tell you to make a single relationship and they say that it should be based on the unique identifier field ... e.g. : name+address? Is this correct?

When I place the 'related' Counter field on my layout, all I get is "" in every record.

Also, I get the word "Duplicate" in the 'Check Duplicates' field in every record.

Thanks for your time. :

Grant

bruceR · September 25, 2005

Is there a simple way to delete duplicates? Is there some sort of simple script to do this? I would have thought that this was such a basic function in Filemaker that they would have included a script in the programme. I can't tell you how frustrating this has been - pretty obvious stuff but apparently overlooked.

Identifying and deleting duplicates is a LOT more complicated than you seem to realize. What is a duplicate and which record should be eliminated? Delete the oldest record? Newest record? Are two John Smith records the same? What if it is J. Smith in one case, John Smith in another, but everything else about the record is identical? Duplicate record evaluation requires judgement and knowledge of the particular database. An automatic script step would cause significant problems.

Sanjai · September 25, 2005

Yes, you are correct that a relationship needs to be based on unique identifier key. Keep both the counter field and the related counter field on the layout. As you said that it says "Index Missing", empty the counter field for all the records and run the replace step again. Make sure when you are clicking on the replace button, you should first click in the counter field on the layout and on the first record. Please confirm if after the replace step, the counter field contains value for every record or not?

xover · October 15, 2005

Has anyone actually tried following the tutorial (thanks janedoe)?

I have and it is a perfect example why UAT should include help files and be tested on newbies.

Here is what the example SHOULD say (feel free to correct). This is for FM 7 (as i do not know any other version).

1. make a back up of file!

2. identify the field/s you wish to use to determine a unique field in your DB table.

so in this example my table is called 'food' and the field is called 'name'.

3. define a self-join relationship using your identifying field as your key.

to make a self-join:

a) in relationships view (file->define->database, then click on relationships tab) find the table you wish to self-join

: click and hold the button on the identifying field and drag away from table. A line should appear. Now move cursor back over the same idenifying field and release button.

c) a dialog book appears informing you that you are making a self-join. The default name is '' (i.e. 'food 2'). Click on 'OK'. A new table will appear with a link between to the original. If you cannot see the link pull the new table across until you can see the link with a little box in-between.

4. Before you leave the relationship view screen double click on the little box between your self-joined tables(i.e between 'food' and 'food 2'). A 'Edit relationship' window appears.

5. select the left 'Sort Records'. The 'specify' button should illuminate. click on the button 'specify'. A 'Sort records' window appears. In the left list find your identifying field and click on 'move'. It will now appear in the right list. click on 'OK'. Do the same steps for the right 'sort records'. Finally click 'OK'.

6. define 2 fields to your table.

Counter: create as a text field with an auto-entered serial number (select serial number and accept the default values for 'Next' and 'increment by'.

Check Duplicates: create as a calculation field with a text result. enter the formula:

if(Counter = :Counter; "Unique" ; "Duplicate" )

In our example the formula would look like this:

if(Counter = food 2::Counter; "Unique"; "Duplicate")

Click on "ok".

7. In broswe view with a table view of your table showing choose Records->Show all records

8. Click the new Counter field, choose Records menu > Replace Field Contents, and Replace with serial numbers. Again, accept the default values. Select Update serial number in Entry Options, and click Replace.

This will assign a serial number to all existing records in your database. Serial numbers will automatically be entered in new records.

9. Perform a find for Duplicate in the Check Duplicates field.

mleiser · February 16, 2006

I know you wrote the explanation to the self join in Sept of 05, so I hope you're still around. I must thank you tremendously. I've been trying to figure out how the self join worked and you explained it beautifully. By the way I tried it before understanding it and it did work well. I have one question. If the ss#1, whether the first or third, only matches to the first ss#1 in the join file how do we get a listing of all matches in a relational file? For example if you have parent John Smith in one file and then in the relational file you have Bill Smith and Samantha Smith, according to what you wrote you should only get back Bill smith as a child - assuming it's first. Why are we able to get both children if it only matches to teh first?

Mike

Sign In

Removing Duplicates

Recommended Posts

fmresu

kiwiora

rayd8

janedoe

GrantSymon

Sanjai

GrantSymon

Sanjai

GrantSymon

bruceR

Sanjai

xover

mleiser

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information