Jump to content

Grouping duplicates


This topic is 6193 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi,

I have a file with 100 records and I have found 10 duplicates records. I would like to be able to group the duplicates.

10 duplicate records, grouped with a unique id. Producing 5 groups. group ID's 1-5.

I know I have to find my dup's first then sort them. I'm stuck on setting the Group Id field.

How do I do this?

Thanks in advance.

Ghengis

Link to comment
Share on other sites

Yes you got it. I want to group the pairs of dups.

The reason I want to do this is because I want to deal with each pair and find the most complete record and mark the incomplete record for deletion.

What I define as a complete record is a record that has a FirstName, LastName, AddressLine1 and PostCode.

:(

Link to comment
Share on other sites

How 'bout something simple like a self relationship based on the fields that identify a dup (probably first and last name?).

Then, in a layout where you find your dups, you can show a portal for each dup that show it and its twin. Delete the portal row you don't want and move on to the next record.

You could script the delete-portal-row to do the delete and omit the record you kept from the found set for a smoother interface.

Don

Link to comment
Share on other sites

You need to make 5 decisions, I'd say forget about the script and do it manually.

What makes one address more complete than another? What if all four fields have values, but they're different? What if one has only the first name missing and the other has only the address line missing. Which is "more complete"?

Link to comment
Share on other sites

Good points to ponder over. I would still like to identify those records that need attention. Its easy to manually spot records if you're only dealing with small amounts of records. What about dealing with thousands?

I would like to identify out of a pair any records that don't have values in AddresLine1 and PostCode and then mark them for attention.

This is why I needed to group the duplicates and then find those that have missing AddressLine1 and Postcode. I understand your points about defining a complete record.And I take your points on board. At this stage as I'm still a novice and I would to like to script the steps I mentioned above.

Group the duplicates with group id's and then compare the dups based on the criteria I mentioned above.

So far I've been able to group the dups and give them ID's. I'm still stuck on comparing the records within a group.

Could I for example create global fields for Addressline1 and PostCode then set those fields with the first values in the group then go to the next record within the group and then do a comparison?

Does this method make sense?

Sorry for dragging this out. As I said earlier I'm a newbie and I have this itch that I need to scratch!.

Link to comment
Share on other sites

Well, the whole problem rests with you deciding on the rules for a comparison algorithm. Once you do that the problem will be solved.

If you're in fm9, you have conditional formating in your layout to highlight problem records or fields. Or you could set a number field by evaluating the condition of the fields of interest and increasing it's value based on each problem discovered. Then the number value provides a ranking of degree-of-problem.

But it all rests on your choices.

Don

Link to comment
Share on other sites

This topic is 6193 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.