Jump to content

[REQ] How to search for duplicate records to clear and only have unique ones in a table?


Recommended Posts

A table with 100ks of records should be checked for exactly identical records and those should get removed. So I concatenated all fields which have to be included into a single string. But when searching for duplicate records  it failed to find the exact duplicates using "!". Some were the same up until the e.g. 176 chars, but then different. I assumed that even the word index is not working as expected.

@Kevin Frank found out that it can find duplicates until the first 109 chars when 108 are identical but the 109 is different. But asa 109 are identical and the 110th is different it doesn't find duplicates anymore.
Here it doesn't matter whether it is a Value or Word index, Stored calc or Unstored. While I'm pretty sure there is a reason for that behaviour (and why FM-Help utilises just 100 chars) I'd like to understand why.
So, if anyone can shed some light in to that behaviour I'd be grateful!

To solve the original goal I hashed the concatenated string, did a "!"-search and that found me all duplicates. Then I just sorted by that hash and a ran a script to omit the first and delete the other duplicates of each group.

Link to post
Share on other sites

Since I got some useful answers over at FB I want to make sure it stays here too, just in case someone is wondering as well, so:

 

Quote

Cornelius Walker wrote:

FileMaker has two types of indexes: the "word" index that indexes all words in a field (a "word" being defined as a string of characters delimited by word separator characters) and the "value" index (a value defined as a string of characters delimited by carriage returns or the first 109 characters on the line). The "duplicate" search uses the value index.

 

Me replied:
That would explain this unclear behaviour. Do you have any serious source of your statement? (Not that I don't believe you but I'd like to understand the why's)?
Following your statement and the result, a "Find Matching Records" is utilising the word index since it finds the identical records.

 

 
Cornelius Walker wrote:
The "source" was Christopher Crim, Clay Maeckel, and another engineer (whose name escapes me) who designed and built the FileMaker engine when FMP 7 came out. But you can just consult the FileMaker Help. However there is one part of the help entry on "Defining field indexing options" that's not entirely correct.
Per the Help entry, one of the interface elements you'll notice are two options for creating indexes for text fields: minimal and all. "Minimal" actually means either the value index or the word index but not both (if manually selected it creates the value index) and "All" means all indexes available for a field type are created (for non-text field types the only index that exists is the value index so "minimal" isn't even an option). The value index is used for relationships, value lists (a "value" being a line of text delimited by carriage returns), and - as you've discovered - searches using the duplicate ( ! ) operator.
To see this more clearly, turn off indexing for a field but enable the option to auto create indexes as needed. Do a search for a word in a text field and then turn off the option to automatically index the field. It should be on "minimal" from your search. Now try to use that field in a relationship. Note the relationship line ends are barred because FileMaker can not create the value index. On a layout choose the "insert from index" command for this field and notice the checkbox for "show individual words" is checked and disabled - this is a toggle between the word and value indexes for this dialog and the value index doesn't exist and can't be created.

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Similar Content

    • By Oyseka
      Hi All, When I add records to a quote they may be added in multiples until the Quote is accepted. When a quote is accepted I now have to change any multiple service Products to single products to allow for the creation of the barcodes,(serial numbers), used during the service so that any parts used on the product are recorded against that particular product. I created a script which checks for any service Products that are showing multiples and duplicates them the required number of times. The duplication works fine in the QuoteItems layout but unless after duplication, I select either the department or product, then the items do not show in the QuoteItems portal on the Quotes layout despite the Qty being reduced from whatever it was set at to 1 and the Quote Total is incorrect.





    • By Oyseka
      Hi all, I have a requirement to re-create Quotations on a monthly basis that are fixed by a 12 month contract. On creation of the QuoteItems I want to automatically insert a fixed ID that stays with the item. Each month a new Quotation has to be produced which is then reviewed to potentially add to the contract in order to acquire a new Purchase Order  but I do not want to increment the ItemID on existing items. Any help appreciated
       
      Sorry everyone, I figured it out
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.