Deleting duplicate records with a script

John Chamberlain · November 13, 2007

I have been handed an Excel file with 13000 lines of data. About two thirds of the lines are duplicated. I have converted the file to FMP, and need to delete the duplicates. What I would like to have is a simple (?) script that would read the records one at a time, go to the next record and test to see if a key field (a client ID number) is the same, and delete the record then go on testing and deleting through the whole file.

Can anyone help me here?

TIA

tv_kid · November 13, 2007

What I would like to have is a simple (?) script that would read the records one at a time, go to the next record and test to see if a key field (a client ID number) is the same, and delete the record then go on testing and deleting through the whole file.

Hi John

Rather than comparing each record to the next one, which will only find duplicates that follow one another, check out "Identifying duplicate values using a self-join relationship" in the FM Help section. This will give you a more useful way to track down duplicates without a script.

HTH

John Chamberlain · November 13, 2007

My problem with using the self relationship is that it only identifies the duplicates.

What follows is doing a find for the duplicate warning message, the going through them one at a time, deleting records one by one.

With this many records to go through, deleting one record at a time to eliminate the dups is onerous.

tv_kid · November 13, 2007

I must have misunderstood you John, I thought you wanted to get rid of the duplicate entries. If that's the case, couldn't you just find the duplicates and then delete all?

John Chamberlain · November 13, 2007

No, because the delete all would also get rid of the records I need to keep. Think of it this way; you have six records, all with a duplicate record message; if you delete all then you have no records at all, and the client for that ID ceases to exist.

tv_kid · November 14, 2007

But if you follow the instructions in the help file, you'll end up with the first record of any set of the 'same' records marked as 'Unique', with the rest marked as 'Duplicate'. Searching for the duplicate records won't find the unique ones, so you can safely delete them.

John Chamberlain · November 14, 2007

This sounds great, but I need to see this help file. I am no FMP genius, and it sounds like I have been doing this wrong all along. How do I get the field to identify one as Unique and the rest as dups?

AudioFreak · November 14, 2007

See here. It is easier in newer versions but this will get you the results you are looking for.

See Here.

Michael

tv_kid · November 14, 2007

Here's the page from the help file.

I've just seen you're using FMP5. I can't think of any reason why this method wouldn't work on that version, but it's been a while since I've used it....

HTH

_Help.pdf

John Chamberlain · November 14, 2007

THis looks great - it's a little late in the day, but I will try this first thing in the morning.

THANKS!

PS I need to update my profile, I am now using FMP6

John Chamberlain · November 14, 2007

Thanks so much, this was exactly what I needed - I have be going through a lot of pain for a lot of years doing this the hard way - as Homer Simpson might say "DOH!"

Søren Dyhr · November 14, 2007

If you somehow in the near future should stumble upon upgrading as an option, should you bring this approach into you consideration:

http://sixfriedrice.com/wp/deleting-duplicate-records-in-filemaker/

--sd

aldipalo · November 14, 2007

To take this further I have a problem that is growing everyday and is beginning to cause me some grief. I know, in time, if not corrected, it is going to become an alligator on my back.

I have 22,500 companies and 18,500 contacts that have been building for the last 15 years. Years ago, when I worked with a "Database Guru" who set me up with a DBase II DB he told me that the only way to insure a unique ID in my application was to combine the first 5 letters of the company name and the zipcode. This I have done thru DBase and Approach. When I switched to FM I just carried this format over. Problem now is all my contacts are connected thru this CO_ID field. The additional problem is there are multiple records of the same company. Why? Well originally we thought if we had a company like Johnson & Johnson they have 100+ companies and many divisions within each and many locations of the many divisions. So, wouldn't it be easier to enter each separately and enter the contacts for each location related to the individual division and location. I think you are now getting the picture of how my cute little baby alligator is now about 6 foot long and will soon be 12 foot. Additionally, over the years, we've had A LOT OF GIGO.

So, to my problem. Your suggestions above will certainly take care of the dups within each table. But, how would I go about deleting all duplicate companies and moving the contacts into one company folder? I thought perhaps a 3rd table linking the divisions to a main company file. I do have a "Location" field in contacts that could delininate the individual divisions.

Anyway, I could go on ad nauseam.... Your thoughts, suggestions, ideas are appreciated.

TIA

FestiveEmbalmer · November 14, 2007

That is quite a motivation for those of us just starting out on our first major project (read: myself) to get the structure right in the beginning.

I have several different parts of my database that are comparable to what you have described. One is with Notes that need to be in series and another is with Accounts that have sub-accounts.

As far as moving all of your contacts under one company record, it might be better for your uses to have a hierarchy of companies/divisions. Then you could create a contact (address/phone etc.) record that is related to a division but also to every company record that is above it in the hierarchy. When you select J&J for example, you would have a file tree structured portal displaying all of its children divisions and another portal that displays all of the contacts for that level of the hierarchy and up.

I learned how to do it from Matt Petrowsky's site:

http://www.filemakermagazine.com/videos/infinite-hierarchies-creating-a-folder-tree.html

http://www.filemakermagazine.com/videos/infinite-assignments-never-ending-value-lists.html

I would post an example file, but mine follows his so closely I feel it would be unethical. The subscription is worth it.

As for your unorthodox serial issue-

Could you create a new serial number field for each table, populate it [with Replace Field Contents (serial number) or import them into a clone with the autocomplete option checked] and then create a lookup field in the child table that would pull the new serial numbers from the parent table IF the old relationship is valid?

Thank you for bringing this up, these are things I need to consider carefully myself!

Matthew

Fenton · November 14, 2007

Yes, this is kind of a problem. I'm not sure of the ultimate solution, but here's a few thoughts. Have a Branch field put the locations in there. In the company name field make sure the "Johnson & Johnson" is identical in all its records, ie., a drop-down for company name,* putting the location in the Branch field.

Create a calculation field:

Company Name & Case ( not IsEmpty (Branch); ", " & Branch

Use the above field in all places except the raw data entry screen for "Company Name" (obviously not modifiable). Also in value lists (to avoid the "no duplicates in index" problem).

You'd want to mark one of these "company" records as the "primary", likely the 1st. Or, you could just say that any company without a branch entry is the primary; that's easier and seems logical (to me).

*What would be ideal for the drop-down would actually be ONLY the "multiple branches" companies (especially with 22,500). But you would not have that until after you'd entered the company name and the 1st branch. Which would mean you've have to enter the company name that time manually correctly. Shouldn't be that hard though.

Or use a button for "New Branch" that did that stuff for you. That's how I'd do it.

Edited November 14, 2007 by Guest

aldipalo · November 14, 2007

First to Matthew:

Thanks, I've been a subscriber to Matt's magazine and his Scriptology application since about 1 week after I committed myself to FM. His tutorials are excellent and it has taught me a lot.

Fenton:

Thanks for the thoughts. I guess I need to make a backup of my tables and begin the process. Right now CompanyDB and Contacts are 2 separate databases. of course there are notes and activities and emails and telephone tables all connected to contacts. That's not a problem as they are all related by a proper serial ID.

I thought I'd begin by getting rid of all my duplicate contacts and duplicate companies. I set up a dup table as described above (Both Six Fried Rice and FM Help), but, when I go into Duplicate List Layout (A new layout) I am getting all my contacts. I thought the relationship would only give me the dups but preserve the original?

Any idea what's wrong?

Thanks again.

inevaexisted · November 16, 2007

edit..sorry only read the first post.. will leave the rest for anyone else who may need this...

or you could filter the duplicates in excel then import...save yourself alot of time as well...

heres how its done for Excel 2003

1. highlight the set of data

2. Data>Filter>Advanced Filter

- a dialogue box will show

3. ensure that the unique records only check box is ticked.

- filter in place will delete all duplicated in the excel file so if you still want that data then use copy to another location and select the top of an empty column for the copy to.

-hit OK

import as usual...

if that isnt an option...

add a temp text field called Duplicate_Checked

BEGIN SCRIPT

Error Capture [on] // to stop error messages if no records are found

BEGIN LOOP

GOTO 1st record

Set Variable - ClientID=ClientID Number

Enter Find mode

Set ClientID Number = Variable - ClientID

Perform Find Requests

IF (get(foundCount)>1)

{

...

here you can select which of the found records is the one you wish to keep by either checking modified timestamp *assuming you have a field for this* or possibly using the unsort and selecting the newest record...either way select the one you want

...

Set Field [Duplicate_Checked]="Yes"

Omit Record

Delete Found Records

}

Enter Find Mode

Set Field [Duplicate_Checked]="=" // searches for empty fields

Perform Find Request

Exit Loop if [Get(FoundCount)=0]

END LOOP

I hope i havent missed anything in that ....

Ineva.

Edited November 16, 2007 by Guest

Colin M. · November 28, 2008

Thanks for all this. I was stuck into parallel issues for months before finding this thread.

I would like to ask about a refinement of all these scripts and workarounds:

I have too an enormous reptile made of contacts put together by people that were in my office way before I came and who created new FM databases over the years, changing fields, etc.

Now, I have put them in various excel documents with the same field and I would like to import them into FileMaker, but with two conditions:

1) that they are unique - this has been sorted out by the thread, I would just need an auto-increment identifier, and a combine field with, say "First Name - Last Name and Email" that are unique to make sure I don't have duplicates.

2) I would like to find a way that a script could update the data. For instance, if I have an entry "john", "doe" and the phone number and addres and another, maybe added three years after that says "john" "doe" and the email adress, could I have a script that would make me get: "john" "doe" "john doe's phone number" "john doe's Address" "john doe's email address"?

I hope I have explained it well. Now that I think of it, it might be a better idea to keep the duplicated while importing, just giving them a unique ID and then run a script that would do 1) combine data in the most recent entry when "first name" "last name" are equal and 2) delete older entries where "first name" and "last name" are equal.

I am absolutely a beginner in FM scripting and I am putting up together these ideas from logic mostly.

Would anyone be able to help me finding out what such a script would look like?

Sign In

Deleting duplicate records with a script

Recommended Posts

John Chamberlain

tv_kid

John Chamberlain

tv_kid

John Chamberlain

tv_kid

John Chamberlain

AudioFreak

tv_kid

John Chamberlain

John Chamberlain

Søren Dyhr

aldipalo

FestiveEmbalmer

Fenton

aldipalo

inevaexisted

Colin M.

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information