Jump to content

[REQ] How to search for duplicate records to clear and only have unique ones in a table?


This topic is 1108 days old. Please don't post here. Open a new topic instead.

Recommended Posts

A table with 100ks of records should be checked for exactly identical records and those should get removed. So I concatenated all fields which have to be included into a single string. But when searching for duplicate records  it failed to find the exact duplicates using "!". Some were the same up until the e.g. 176 chars, but then different. I assumed that even the word index is not working as expected.

@Kevin Frank found out that it can find duplicates until the first 109 chars when 108 are identical but the 109 is different. But asa 109 are identical and the 110th is different it doesn't find duplicates anymore.
Here it doesn't matter whether it is a Value or Word index, Stored calc or Unstored. While I'm pretty sure there is a reason for that behaviour (and why FM-Help utilises just 100 chars) I'd like to understand why.
So, if anyone can shed some light in to that behaviour I'd be grateful!

To solve the original goal I hashed the concatenated string, did a "!"-search and that found me all duplicates. Then I just sorted by that hash and a ran a script to omit the first and delete the other duplicates of each group.

Link to comment
Share on other sites

Since I got some useful answers over at FB I want to make sure it stays here too, just in case someone is wondering as well, so:

 

Quote

Cornelius Walker wrote:

FileMaker has two types of indexes: the "word" index that indexes all words in a field (a "word" being defined as a string of characters delimited by word separator characters) and the "value" index (a value defined as a string of characters delimited by carriage returns or the first 109 characters on the line). The "duplicate" search uses the value index.

 

Me replied:
That would explain this unclear behaviour. Do you have any serious source of your statement? (Not that I don't believe you but I'd like to understand the why's)?
Following your statement and the result, a "Find Matching Records" is utilising the word index since it finds the identical records.

 

 
Cornelius Walker wrote:
The "source" was Christopher Crim, Clay Maeckel, and another engineer (whose name escapes me) who designed and built the FileMaker engine when FMP 7 came out. But you can just consult the FileMaker Help. However there is one part of the help entry on "Defining field indexing options" that's not entirely correct.
Per the Help entry, one of the interface elements you'll notice are two options for creating indexes for text fields: minimal and all. "Minimal" actually means either the value index or the word index but not both (if manually selected it creates the value index) and "All" means all indexes available for a field type are created (for non-text field types the only index that exists is the value index so "minimal" isn't even an option). The value index is used for relationships, value lists (a "value" being a line of text delimited by carriage returns), and - as you've discovered - searches using the duplicate ( ! ) operator.
To see this more clearly, turn off indexing for a field but enable the option to auto create indexes as needed. Do a search for a word in a text field and then turn off the option to automatically index the field. It should be on "minimal" from your search. Now try to use that field in a relationship. Note the relationship line ends are barred because FileMaker can not create the value index. On a layout choose the "insert from index" command for this field and notice the checkbox for "show individual words" is checked and disabled - this is a toggle between the word and value indexes for this dialog and the value index doesn't exist and can't be created.

 

Link to comment
Share on other sites

This topic is 1108 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.