Deceptive Duplicates

LaRetta · November 11, 2008

You may be sorely surprised at your results when using [color:red]! to find duplicates ...

Values are duplicates only if word order is the same. When determining uniqueness in text fields, FileMaker Pro looks at the first 100 characters in each field.

In reality, the following will match as duplicates ...

[color:blue]¶

Always

[color:blue]Always

¶

[color:blue]Always

Never

[color:blue]Never

Always

We are not warned anywhere in Help that I can find that each LINE in a field is treated separately. I would not expect the above 6 records to match on [color:red]!.

A field's entire contents should be counted as ONE value, matching character by character. The reason a User would use [color:red]! is to find EXACT DUPLICATE VALUES within a field!! FM Help suggests, but does not adhere to, "Values are duplicates only if word order is the same." In the above example, the [color:blue]Always and [color:blue]Never are reversed. And you will end up with Never matching Always because somewhere in one field, both values may exist. Good grief, FileMaker!!

I realize FM means WORDS in a SINGLE line but it is very deceptive and poorly written and it does not fit the spirit of the purpose of using [color:red]! at all. I have always used calculations to find exact duplicates. But I have also constrained a record set down with [color:red]! to begin looping or working on a set. In my opinion, [color:red]! is worthless.

UPDATE: No, it is more than worthless ... it is dangerous which is why I'm bringing it up.

Edited November 11, 2008 by Guest
Added update

mr_vodka · November 11, 2008

In my opinion, [color:red]! is worthless.

IMO, the FM help is worthless. B)

Vaughan · November 11, 2008

There might be a trick in the way FMP works with "words".

If FMP is just looking at words, then "¶Always" and "Always¶" are the same, because the leading a trailing paragraph is stripped/ignored. They'll match "Always".

Where I work there is a person whose last name is 'Ilolahia where the first character is an apostrophe. The "word" functions like LeftWords() strip off the leading apostrophe. Apostrophes, commas, spaces and a few others like hyphens act as word separators (though the behaviour differs in text that looks like dates).

Vaughan · November 11, 2008

I just did a test in the data viewer with...

wordcount( "¶always¶never¶at the end" )

... and it evaluates to 5. So FMP is ignoring the ¶s completely.

So for your original examples, FMP only sees

always

always never

never

never always

regardless of the placement of carriage returns.

comment · November 11, 2008

I believe it's not about words, but about indexing (the 100 characters limit is a dead giveaway). It seems that if two records share an index entry, they're considered to be duplicates of each other.

Vaughan · November 12, 2008

The index is "word" based too...

LaRetta · November 12, 2008

Thanks for jumping in, Vaughan! :smile2:

Yeah, my point is that it is not what you would expect ... that FM indicates that records with these values in one field ...

always

always never

never

never always

... are considered duplicates when using [color:red]!. I understand word breaks but spaces are word breaks and so are carriage returns. There is not one theory which holds here, in my opinion. In addition, FM says that the ORDER must be the same; but [color:blue]always never certainly isn't in same order as [color:blue]never always (with or without the carriage return)! I posted this thread only to bring it to people's attention ... that using [color:red]! can catch them off guard if not careful.

So regardless, we are not working with words here. The purpose of [color:red]! is to search for duplicate RECORDS with the same value in ONE field. And ALL characters should be taken into account in that regard. So just keep in mind that the search for duplicates applies to LINES and not field values.

LaRetta · November 12, 2008

Hi Michael,

I am interested! I just tested and I find no difference in the duplicates produced; whether the field is indexed as All or Minimal. And, no matter which storage option I select, it produces two results: always and never.

I appreciate trying to understand the theory; I thank you both. But I do not believe the theory is logical ... at least not yet.

LaRetta · November 12, 2008

I suppose I thought that [color:red]! should be the same as using [color:red]=. I doubt I'm alone in that incorrect assumption. :crazy2:

comment · November 12, 2008

The index is "word" based too...

No, at least not for this. There is a separate (and optional) word index, for text fields only. The regular index is a value index, and it looks like this is the one used for determining duplicates:

"blue" and "blue¶sky" are duplicates;

"red" and "red herring" are not.

LaRetta · November 12, 2008

I see it, Michael. It isn't like = (comparing values within the same field) but rather like = if the field was joined to itself ... each line a separate entity and relating independently.

I still think it goes against what most people would expect from a 'duplicate' comparison. I suppose I might use it if I want to see if I have any self-join possible records. :wink2:

comment · November 12, 2008

like = if the field was joined to itself ... each line a separate entity and relating independently.

I believe that is correct.

I still think it goes against what most people would expect from a 'duplicate' comparison.

Well, you made an entry in my "Things not in the Manual" document…

Now compare this to 'Find Matching Records' from the shortcut menu - that's yet another animal of a totally different species.

LaRetta · November 13, 2008

Now compare this to 'Find Matching Records' from the shortcut menu - that's yet another animal of a totally different species.

It may be different species than [color:red]! but it seemed consistent with a standard find; treating the search criteria as start of word (regardless of word separators before or after). It even remained as the last find request for easy future modification.

Words found if searching for Ale:

Ale

(sp) Ale

¶

Aleron

Pale Ale

Words not found:

Bale

Tales

You don't mention things without reason, Michael. What have I missed here? :idunno:

LaRetta · November 13, 2008

BTW, I can find no specific reference to shortcut 'Find Matching Records'. Searching FM Help is almost as difficult (and worthless) as searching Tech Info. It's a good thing FM Forums is here!!

LaRetta · November 13, 2008

Wow - one exception. Switched words. Find will find Pale Ale and Ale Pale but shortcut will not!

UPDATE: Found it under: Performing quick finds based on data in a field... then why don't they call it quick find so we know what to search for?

DOUBLE UPDATE: FM Help says, "FileMaker Pro performs a phrase search, returning only those records that include the words in order."

WRONGO. Correction ... I mean [color:green]RIGHTO! This is a big difference and important as well!

Edited November 13, 2008 by Guest

comment · November 13, 2008

it seemed consistent with a standard find

It doesn't perform a standard find (words that start with). It does a phrase search (as if the search criteria were enclosed in quotes). That's all fine and documented* - but I would expect "matching" to mean an exact match. I don't see why Find Matching ["red"] should find "red" and "red herring", but Find Matching ["red herring"] finds "red herring" only. Either the two records match each other, or they don't.

--

(*) Help > Finding, sorting, and replacing data > Finding records > Performing quick finds based on data in a field

EDIT: I see you found all this while I was posting.

Edited November 13, 2008 by Guest

LaRetta · November 13, 2008

Thank you for mentioning this and piquing my curiosity. I wonder just how useful it would be since we can't script it (or can we?). That's one reason I haven't paid much attention to its ability there in right-click land. If it were phrase only, then it wouldn't accept (sp)red and find red. It seems more word-specific but in specific word order. Just thinking out loud, of course ...

UPDATE: Silly brain of mine; you are correct about phrase only. I was reversing the theory and applying it incorrectly.

Edited November 13, 2008 by Guest
Added update

comment · November 13, 2008

You can script it indirectly, by doing all the work that the shortcut does (grab the phrase, enter Find mode and set the field to quoted phrase). I don't see how it would be useful, but perhaps in some circumstances…

My rant is from the user's perspective: it shouldn't be named 'Find Matching Records' when it really does "Find records that contain this exact phrase'. And I believe ''Find Matching Records' as in 'Find records with this exact content' would be more useful.

comment · November 13, 2008

BTW, here's another interesting tidbit. Say you have a field that contains "red herring", and you do Find Matching Records on that. If you then do Modify Last Find, it shows "red herring" (without quotes). That seems to be another, undocumented way to do a phrase search.

LaRetta · November 13, 2008

That seems to be another, undocumented way to do a phrase search.

Strange. If I enter paleale it doesn't work when the phrase is pale ale. I must enter pale(sp)ale. Indeed, if I have field with two spaces then ale and search regularly for (sp)ale, it finds:

ale pale

pale ale

(sp)(sp)ale

... but it doesn't find ale.

But using quick find, if I have (sp)(sp)ale, then modify the find, it looks like (sp)(sp)(sp)ale which only finds (sp)(sp)ale. And if I have a field with:

tall

black

ale

... and I search regularly for:

tall

black

ale

... it finds it but won't find:

black

tall

ale

[color:red]And my favorite REGULAR FIND picked from this ...

Search for [color:blue]pale(sp)(sp)ale finds [color:blue]pale bright ale and [color:blue]pale bright true ale. So this is a starts-with and ends-with find! It works with lists also...

Now(sp)

men

... finds list with:

Now

is

the

time

for

all

good

men

okay, whew! I must sleep but I'll be playing a blue streak with this wonderful find of yours, Michael!!

Edited November 13, 2008 by Guest
added red for emphasis

comment · November 13, 2008

So this is a starts-with and ends-with find!

I don't think so. What you actually did is search for two phrases. IOW, the same criteria would also find:

is

the

men

Now

time

for

all

good

but not:

men

Now

is

the

time

for

all

good

(because here "men" doesn't have a return in front of it).

LaRetta · November 13, 2008

Good morning! Okay, a small damper on my excitmenet of it but not much! It still adds greatly to our ability to search!

And I agree that Find Matching Records is just as misleading as suggesting ! produces duplicates. If Find Matching Records were to be called Quick Find then Users more readily would have understood it wasn't a true match. But I think it should have been called Phrase Find, as you've explained how it works already.

What other tid-bits do you have under your hat? I've a feeling you have enough of these surprises of 'what's not in the manual' to fill a manual. :shocked:

comment · November 13, 2008

I've a feeling you have enough of these surprises of 'what's not in the manual' to fill a manual.

Not yet, but it's slowly getting there. I wonder how much of it will be relevant to the next version (my bet: a lot).

LaRetta · November 13, 2008

(my bet: a lot).

And mine. The same person responsible for keeping the Help up-to-date and complete is responsible for import/export processes which hasn't be touched since vs. 3 I suspect. :crazy2:

[color:green]DISCLAIMER: Opinion only. I don't know if the same person is responsible for both pieces ... it just FEELS that way. :content:

Edited November 13, 2008 by Guest
Added green disclaimer ;-)

Taig McNab · November 13, 2008

You can script it indirectly, by doing all the work that the shortcut does (grab the phrase, enter Find mode and set the field to quoted phrase). I don't see how it would be useful, but perhaps in some circumstances…

Hi,

Could you tell me how you would do this - its exactly what I've been trying to do:

I have a set of records and would like a button to take the value of a particular field and display all other records with a matching field - exactly what the right click "Find Matching Records" does.

How do you get the data out of the field and then in to the find query?

Is it along the lines of:

Copy [select;Field]

Enter Find Mode [Pause]

Set Field [Field; #what goes here?!#]

Perform Find []

Any help would be much appreciated.

Regards,

Taig McNab

comment · November 13, 2008

More like:

Set Variable [ $criteria ; YourTable::YourField ]

Enter Find Mode []

Set Field [ YourTable::YourField ; Quote ( $criteria ) ]

Perform Find []

Note that this takes the full content of the field - unlike 'Find Matching Records' which takes only the selection, if there is one. That could be simulated too, with a bit more effort.

Taig McNab · November 13, 2008

Beautiful! Just what I was trying to do. Thank you...

I've gained just enough new knowledge to be dangerous now

Regards,

Taig McNab

aholtzapfel · November 13, 2008

I believe it's not about words, but about indexing (the 100 characters limit is a dead giveaway). It seems that if two records share an index entry, they're considered to be duplicates of each other.

I feel that this statement is correct but there is more to it.

A field appears to be able to have more than one index! One for each line. The First index in the field is the one used to sort but all the other indexes appear to work as you would expect any index to for finds and relationships.

This is a pretty cool behavior (and I have seen it used a few times with spectacular results but never documented by Filemaker) but I have a question.

Giving that each value in a field seperated by a carriage return has it's own index, does filemaker really only index the first 100 characters, or does it index the first 100 characters of each value in a field?

comment · November 13, 2008

A field appears to be able to have more than one index! One for each line.

I wouldn't put it quite that way. There is only one field index (disregarding for the moment the optional word index). Each value gets a separate entry in the index.

does filemaker really only index the first 100 characters, or does it index the first 100 characters of each value in a field?

I believe the limit is applied to each individual value separately. Also, the number of characters is actually a bit higher - 109 or 110 IIRC.

aholtzapfel · November 13, 2008

So, back to the start of this thread.

Question:

What does a "!" do in a find? (Not quite what FM says it does.)

Answer:

For any given field, it finds records with at least one value (values separated by carriage returns) in common.

(This is mostly just for myself to try to clarify what is really going on. If anyone knows differently or has a better way to state this, it would be welcome.)

Sign In

Deceptive Duplicates

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Important Information