Jump to content

Finding the first x number of duplicate records from table relationship


Recommended Posts

I have a table of data where there may be many duplicate records based on the 'Name' field, however the other fields data are different. 

I need to find the first 6 duplicated records based on the 'Name' field and then set a number value (incremented starting from 1 ) against them in a field used as a flag.

The table may contain as many as 20,000 records with each unique 'Name' value having 0 to 50 duplicates.

Does enyone have any idea how I might achieve this using a script or a custom function?

Link to post
Share on other sites

Do a find in the name field? You might want to only use the first three letters of the first name and last name to get a broader result. 

Link to post
Share on other sites
2 hours ago, MartinL said:

I need to find the first 6 duplicated records based on the 'Name' field

That is not quite clear. Do you mean the first 6 duplicates of each Name? Or just the first 6 duplicates of some Name? If the latter, which one? And what determines which is "first"? First in what order?

 

Link to post
Share on other sites

Thank you for the reply.

it is up to the first 6 duplicates of each name and the order is just in the order of creation.

Link to post
Share on other sites

Here is a rather simple way to do it:

First, find the duplicates by performing a find for ! in the Name field. Then sort them by Name. Then do:

Go to Record/Request/Page [ First ]
Loop
   If [ $name ≠ YourTable::Name ]
      Set Variable [ $name; Value:YourTable::Name ] 
      Set Variable [ $i; Value:1 ]
   Else
      Set Variable [ $i; Value:$i + 1 ]
   End If
   If [ $i ≤ 6 ]
      Set Field [ YourTable::Flag; $i ] 
   End If
   Go to Record/Request/Page [ Next; Exit after last ]
End Loop

Now, there is a way to make this faster by jumping from the 6th record directly to the first record of the next group, using a variation of the "Fast Summaries" method by Mikhail Edoshin. But I doubt you need the added complexity - hopefully you don't need to do this often.

 

 

Another option is to define a summary field as Count of Name (or any field that cannot be empty), running, with restart when sorted by name. Then (after finding and sorting) do simply:

Replace Field Contents [ YourTable::Flag; Replace with calculation: If ( Table::sRunningCount ≤ 6 ; Table::sRunningCount ) ] [ No dialog ]

 

Edited by comment
  • Thanks 1
Link to post
Share on other sites

Hi,

Thank you for all of the replies.

I used Comment's option which worked realy well even with a dataset of over 50,000 records.

Link to post
Share on other sites
22 minutes ago, MartinL said:

worked realy well even with a dataset of over 50,000 records.

Good. 50k records is not that much these days. Esp. if you do it in Form view and start with freezing the window (I should have mentioned this in my answer, but I was too absorbed in the logic of the process).

BTW, what is the purpose here? What makes the first 6 records different from the other duplicates?

 

Link to post
Share on other sites

The client needed to run the search so that they could export the found records and use them for a mail campaign.  They then run the code again to find the next batch and so on until there were none left.  Although the Company Name value might be duplicated the rest  of the info such as the email address is not.  They were trying to not bombard their clients with numerous emails at once.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.