duplicates

February 12, 200422 yr

Hello,

After an import I have duplicates of some records and I need to delete these. The procedure set out in the FM Help is of no use (I assume because of the imports without re-doing the whole process each time.

I have a script step to find duplicates using search "!" in the relevant field. I have a field 'delete marker', in which the script will place a "D" into every second record that is found. ie

go to first

loop

next record

set field "D"

next record

end loop

I then search for "D" and delete these. This works fine, because almost always there will only be one duplicate record at a time. But the possibility exists that I could have two duplicates (plus the original) in which case this system of marking every second record will break down.

The field being searched is the only one available, and other calc fields to help identify records is not a possibility. Is there a better way I could do this?

February 13, 200422 yr

Glenn ...

By definition, duplicate records are those that contain identical information in one or more specified fields. In your post, you refer to this as "the relevant field". Let's say it's called: "clientID". So, in a case where, after importing, your records are:

clientID

001

003

002

003

001

002

001

003

004

005

006

you first find the duplicate records using the "!" symbol in the clientID field in Find mode, and then sort the records by clientID to arrive at:

clientID

001

002

003

You create a global field of the same type as your "relevant field", e.g., if clientID is a Text field, then the global must be of type Text. Let's call it: "g_ID". This will be used in the duplicates deletion script below.

Your script would look something like this:

Go To Record/Request [First]

SetField [g_ID, clientID]

Loop

Go To Record/Request (Next, Exit after last)

If (clientID = g_ID)

SetField (delete_marker (your existing marker field), "D")

Else

SetField [g_ID, clientID]

End If

End Loop

Enter Find Mode (be sure to uncheck the "Pause" and "Restore find requests" options)

SetField [delete_marker, "D"]

Perform Find [Replace Found Set] (be sure to uncheck the "Restore find requests" option)

If (not Status(CurrentFoundCount))

Show Message ["No duplicates found.", Buttons, "OK"]

Else

Delete All Records [No dialog]

Show Message ["Duplicates deleted.", Buttons, "OK"]

End If

The logic of the above script is: if the current record's clientID matches the value in g_ID, mark the record as a duplicate, otherwise, set g_ID to the current record's clientID. After looping through all of the duplicate records, find records marked "D". If none are found, alert user, otherwise, delete the found set and alert user of the deletion. The script will delete N records with clientID values matching the first record with that value.

Hope this helps.

February 13, 200422 yr

Hi Glenn,

You can avoid importing duplicates in the first place by using the "update matching records in current found set" & "add remaining records" options. This should do the trick for you.

Dan

February 13, 200422 yr

Author

Jim,

That'll do the trick. Thanks for your help, I appreciate that very much.

Dan, yes I had thought the same thing but the match field for the update is also the import field, which as far as I can tell prevents this solution.

As I write this I wonder why I don't simply create a calc field as a "duplicate" field. If I have field 1, then create calc field 2 whose calc is 'fieldname 1' - use that for the match and import into the other. Yes?