Import into related FM databases & de-duping

rafs ome · November 27, 2003

Seeking Simple quick solution:

an org. currently has several excel spreadsheets (one per department) with contact names and address.

This has to be imported into three related databases & de-duped

contact name

address (related to name)

department category (related to name)

how to approach this task - these excel lists are about 1000 people long and there are 8 of them.

should i import them into a de-duping database with tick boxes & get a script to place data in different databases and delete the dupes, or is it quicker just to do it all manually in excel?

or even better is there a fm tool to do this?

thanks

BobWeaver · November 27, 2003

I would import them all into one Filemaker database and then do a search for duplicates.

If you have separate fields for first and last name, you can create a calculated Full name field using the formula:

FirstName & " " & LastName

Then you can do a duplicate search on this field. You can also do a duplicate search on the phone number field. But, you can only do a dup search on one field at a time. So you have to pick one. Enter find mode and put an exclamation mark in the field you want to search. Then click find. Once you get your found set of duplicate records, sort by the field you did the dup search on, in order to group the duplicates together. Then go through the list and delete (or mark for later deletion) the records you don't want.

Note, the duplicate search finds the original and all duplicates. So don't just delete all the records that come up in the found set.

Fenton · November 27, 2003

I'd de-dupe them in FileMaker. There is no "tool" to do this, for one thing because one man's dupes are another man's data. But you can easily build your own.

The first thing to do is to get the Dept. Category in there. It could be done with either a Fill Down in Excel, before Import, or a Replace afterwards; whichever you're more comfortable with. Replace is Undoable. It would be done right after each Import; just type the Dept. name into the first record's field, then choose Replace Contents. The default will say Replace with "the name you typed," and you say "Replace" ("Cancel" is the default button).

Now, for dupes, my big question is, "Do you have more than 1 address per Contact, or more than 1 Contact at an address (lots of them)?" If no, then why the separate file for Addresses? If yes, then a unique record consists of the name AND the address. I'm going to assume the latter.

There's more than 1 way to de-dupe. But I'm going with a self-relationship method. It can also be used later to prevent duplicate entry.

You need to "concatenate" (put together) the Name and Address, in a calc field, result Text, _cNameAddress = Name & " " & Address.

Now create a self-relationship on that, "self_NameAddress," same field on both sides.

You also need a real Serial ID field (text or number), auto-enter serial number, start 1, increment 1. Actually you should create this before importing anything, so it would populate. Otherwise do a Replace after all imports are done, using the same options above, with the "Update" option checked.

De-Dupe Script (saves 1st entry):

Freeze Window

View As [View as Form]

Enter Find Mode []

Set Field [

Name_Address.zip

rafs ome · December 2, 2003

like the neat script, thanks for the input

Now you have got me thinking - I assumed that a contact might have several address but not that a address would have several contacts - then i thought how the hell does one not duplicate either with two related database -then i thought how about a third database to link the id's of contact ID & address ID - with related calculation fields in this third database to pull data onto portals in either the contact of address layouts (i.e. so a contact can have a portal with a list of their address by relating to this third database that has calc fields relating to the original data record) - BINGO no redundant data. (if any one has any pointers on setting such a 3-way database up -beginning with what it is called so I can name my fields appropriately?)

But this still leaves me with my original problem......

The sort of de-duping in the previous post (if i understand it) will remove all those records that contain the same name and address, but it would also have lost the department data where we have duplicate name_address but different department fields.

So perhaps this below is a strategy? (lets try and forget about the departments for the time being because they make my head hurt).

-import it all in to contact.fp5

create a script that loops through the dupes:

-finds name dupes and give all these dupe groups an separate ID that increments with each group of dupes. i.e. all BOB&HOPE records have the same ContactID

-do the same for address to create groups of AddressID's

-export out AddressID and ContactID to the '3-way' match.fp5 database file

-export out ContactID and Department to the department_match.fp5 database file

-make copy of contact.fp5 and call it address.fp5

-delete all address fields including addressID from contact.fp5

-do the same to address.fp5 but remove the contact data.

-de-dupe duplicates in contact.fp5, address.fp5

-de-dupe department_match.fp5 & 'match.fp5 but with the concatenated match

AND BOBS YOUR UNCLE....(and I didn't ignore departments)

ok.ok this is all very exciting but this "3-way" match thing is really going to confuse mail lists...creating new records...parentless records... perhaps I'll deal with that next time.... ;-)

Fenton · December 3, 2003

Yes, you're right. I got carried away nuking duplicates, forgetting to save them to another file.

Your way of looping through, setting ContactIDs, then exporting the Addresses and Departments will work, but I have another way I've done it (see below).

First let's get the files and relationships straight. I see a hierarchy.

Departments --> Contacts --> Addresses

In other words each Contact only belongs to 1 Dept., and each Address only belongs to 1 Contact.*

It's not many-to-many. So it's simpler. There are 3 files, one for each of the above. There is no need for a "join" (or your calculation) file.

In your Contact file you have a ContactID, which you've populated via a loop, setting the Contact Name value into a global field, comparing as you go, incrementing ContactID by 1 when a new contact is hit, otherwise setting the same ContactID.

This is really all you need. Addresses are identified by their ContactID; they don't need a special "AddressID" to identify them; though they may as well have an AddressSerialID. It would end up being an auto-enter serial ID in Addresses.

*If, however, people are sharing an address, then I might put both their ContactIDs into a ContactIDs field (return delimited), for eventual transport to the Addresses file. This is an important question. And, the truth is I've never had to deal with it. But I could see how 2 people could share an address, boy meets girl, etc..

As you say, an actual AddressID would work, but has problems when it comes to data entry, since the AddressID would have to be passed back to the Contact file AFTER the address record has been created. Scriptable, but not compatible using a portal with "allow creation of related records."

You may want to enter addresses for new records into globals first anyway, to check for a duplicate before creating a new address record, if you have shared addresses, which I don't know...

Similarly each Contact hopefully only belongs to 1 Department. Otherwise you'd likely need a join file for unique combos.

There is another way to do the file creation, which may be easier. It uses a "summarized" export to output the Contact and Department files, from the original file with all records, which remains as the Addresses file.

I wrote a post recently describing this method, so I'll paste a modified version here:

Create a summary field. It can summarize any field, it doesn't matter what. But FileMaker demands at least 1 for the next step.

Sort by the field you want to be unique. Export as a summarized FileMaker file.

Under the fields to export is a "Summarize by" button. That opens another window. The field you sorted by will be there. Click just to the left of it; you will then see a checkmark. Don't miss this step!

Click "OK." It will now say, "Summarize by" and "field you want"

It will export only 1 record for each unique value

By using the Contact Name for one, then Department for another, you get those files.

Serialize a ContactID field in the Contact file. It should be an auto-entered serial ID field.

Create a relationship between the Contact Name, from Addresses to the Contacts file (it will match, 'cause they all came from the same place).

Replace the ContactID into Addresses, using the Contact Name relationship.

Change the Contact Name relationship to use the ContactID.

Convert the Contact Name field in Addresses to a unstored related calculation (could delete it, optional).

Create a relationship from Contacts to Addresses, on ContactID.

Address(es) should show up in a portal.

The Contacts file has the Department name in it.

Create a relationship from Contacts to Departments, on the Department name.

Serialize the Departments file, to get an ID.

In Contacts, Replace into a DepartmentID field, using the Department Name relationship.

Change the Department name relationship to use the ID instead.

Convert the Department name field to a unstored related calculation (could delete it, optional).

Sorry for the confusion. But there really are 2 ways to do it; not to mention questions about many-to-many on addresses.

IMPORTANT: Data must be spelled correctly for duplicate techniques to work. First sort and view the data carefully, because every misspelling of any kind will cause a unnecessary record to be created.

Sign In

Import into related FM databases & de-duping

Recommended Posts

rafs ome

BobWeaver

Fenton

rafs ome

Fenton

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information