Confusing Find Results

oilcan · January 14, 2010

I have a find result that I don't understand, and I was hoping somebody could shed some light for me.

I manage and develop a FileMaker database for my lab. In one particular text field, among many other values, we have the following two values:

HLA-B-1501

DRB1-1501

It was recently pointed out to me that when you do a find in that field for:

1501

You get hits for the HLA-B-1501, but not the DRB1-1501. Perhaps there is a good reason for this, but I am at a loss to explain this behavior. Does anybody have any insight into this?

Matthew F · January 14, 2010

No, I don't get it. For some reason having a number instead of a letter in front of -1501 makes a difference.

In any case the right way to do the search is to use *1501 (asterisk preceding the term). The asterisk means '0 or more characters' and it is one of the standard operators that shows under the 'Insert' button when you are in find mode.

LaRetta · January 14, 2010

The reason is how FM interprets word breaks.

HLA-B-1501: First dash has alpha before and after it so it is a word break. And it has an alpha before the second dash and a number after so it is considered a word break (it would count as three words)

DRB1-1501: Dash is between two numbers so it is NOT a word break so it counts as one word. So HLA-B-1501 is interpreted as HLA B 1501 and [color:green]FM finds the 1501 but DRB1-1501 doesn't find 1501.

There have been a few great posts on the subject by Comment but I don't can't seem to produce the full links. They are Posts 176047 and 175060 (neither are coming up). I will try to find them later but here is a bit of information I have stored about the word break rules ([color:blue]from Comment)

[color:blue]RULE 1 (period .)

one.two = 1 word

one.2 = 2 words

1.two = 2 words

1.2 = 1 word

RULE 2 (slash / hyphen - colon : comma ,)

one-two = 2 words

1-two = 2 words

one-2 = 2 words

1-2 = 1 word

RULE 3 (single quote ')

one'two = 1 word

one'2 = 2 words

1'two = 2 words

1'2 = 2 words

The ampersand and hyphen are word delimiters - but they are not the only ones. There are many more, first and foremost a space, of course. I don't know if there's a comprehensive list, but at least !@%^()≠+[]{}<>≤≥`~|? are all word delimiters.

Word delimiters are not words, so you cannot get them with the MiddleWords() function.

space

*

#

;

"

=

$

There are other seemingly strange rules, such as text field with abc 34). If you create calculation for RightWords ( text field ; 1 ), it produces 34. The ) is ignored because it is one of the word breaks.

UPDATE: Oh! I'm a colorful puppy today!

Edited January 14, 2010 by Guest
Corrected green

oilcan · January 14, 2010

cool, thanks for the input guys.

michael3785 · January 16, 2010

Hi guys

Found this post interesting as I am having a similar problem but mine does not involve any punctuation.

A find for

00000000533006B9BD99CC4D8B1F70AF9DA7783B0700A5B4F3D8CC2B3C4FA1DB76709804EC44000000001BA00000A5B4F3D8CC2B3C4FA1DB76709804EC4400000041D2610000

finds things like

00000000533006B9BD99CC4D8B1F70AF9DA7783B0700A5B4F3D8CC2B3C4FA1DB76709804EC44000000001BA00000A5B4F3D8CC2B3C4FA1DB76709804EC4400000000425B0000

and

00000000533006B9BD99CC4D8B1F70AF9DA7783B0700A5B4F3D8CC2B3C4FA1DB76709804EC44000000001BA00000A5B4F3D8CC2B3C4FA1DB76709804EC4400000041D08C0000

(There are no carriage returns in the actual values)

I can make the find work properly by putting a quote at each end of the search string, but this does not address my application as I am actually trying to make a relationship between records using this data and the relationship is matching records with the wrong key on the "right" of the relationship.

Are the 0's doing something maybe?

Any input much appreciated.

Many thanks

Michael

Edited January 16, 2010 by Guest

IdealData · January 16, 2010

That's a long key field! In fact they are 140 characters long.

Some time ago I discovered an indexing limitation which seemed to be around 100 chars - but a close inspection of your results would suggest 127 or 128 - both are notable numbers.

It's difficult to say exactly as the 127/128th chars fall into a block of zeros.

Use a shorter key? Or possibly split the key between 2 fields?

Edited January 16, 2010 by Guest

comment · January 16, 2010

Or convert the hexadecimal values to decimal numbers.

michael3785 · January 17, 2010

Hi all

Many thanks for your help on this. I went with the split key option and this work great.

BTW, I am not generating the keys - they are from Outlook using the Productive Outlook Maniplulator plug-in so I don't get to choose the key length.

I thought about converting the hex values to decimals but I thought this might allow for collisions where record A was "01234A" and record B was "012341". If I was using a A->1, B->2, type conversion scheme, record A would go to "012341" and collide with record B.

I should add, that I tried a workaround in the scripting to use a find on the key field rather than a relationship with it, but putting quotes around the key value when finding made the find take 3-4 seconds (I am searching a table with 200,000 records) so it would not have been scalable for batch scripting. The relation look-up is almost instant so works better for me.

Anyway, thanks again for your input. All sorted

Michael

LaRetta · January 17, 2010

I didn't know there were different types of conversion schemes. I would think mathematics would be straightforward, particularly on something like hexadecimal to decimal, but I could be very wrong. I get different numbers and I'm unsure if you could ever get dups because alpha increments 10, 11, 12 (for A, B & C) etc. and hexidecimal works in 16's.

// 012341

1 * (16^0 ) + 4 * (16^1 ) + 3 * (16^2) + 2 * (16^3) + 1 * (16 ^4) + 0 * (16 ^5 ) = 74561

// 01234A

10 * (16^0 ) + 4 * (16^1 ) + 3 * (16^2) + 2 * (16^3) + 1 * (16 ^4) + 0 * (16 ^5 ) = 74570

UPDATE: I suppose if G were in the second position, translating to 16 ... but then we look at single digits in the translation so, in looking at the pattern and not testing thoroughly, dups wouldn't make sense.

Edited January 17, 2010 by Guest
Changed sentence in update :^)

comment · January 17, 2010

// 012341

1 * (16^0 ) + 4 * (16^1 ) + 3 * (16^2) + 2 * (16^3) + 1 * (16 ^4) + 0 * (16 ^5 ) = 74561

That is correct (except the redundant parentheses). Of course, with 142 digits you'd want to use a custom function for this.

There can be no G in a hexadecimal number.

LaRetta · January 17, 2010

I can find nothing in what I'm reading which says which alpha characters are valid in hexadecimal. Can you help me isolate valid alpha and could we ever deal with an invalid hexadecimal in which we should trap for invalid alpha [color:green]or other characters? As for the redundant parentheses, LOL, I just used theories on line for the conversion. Of course they aren't necessary and I appreciate you catching me!! :laugh2:

Edited January 17, 2010 by Guest
Added green

comment · January 17, 2010

The valid symbols are 0-9 and A-F (either upper or lower case may be used).

If a hexadecimal number contains other characters, it should be treated the same way as a decimal number that contains non-digit characters.

Edited January 17, 2010 by Guest

LaRetta · January 17, 2010

This custom function might do the trick:

HexToDec ( string )

Case ( Length ( string) ≠ Length ( Filter ( string ; "0123456789abcdefABCDEF" ) ) ; "Invalid" ;

Let ( [

x = Left ( string ; 1 ) ;

alpha = Position ( "ABCDEF" ; x ; 1 ; 1 ) ;

char = Case ( alpha ; alpha + 9 ; x ) ;

len = Length ( string ) - 1

];

If ( Length ( string ) > 0 ;

char * 16^ len + HexToDec ( Right ( string ; len ) )

)// end if

)// end let

)// end case

If a hexadecimal number contains other characters, it should be treated the same way as a decimal number that contains non-digit characters.

I chose to invalidate the string if it contained invalid characters. If I filtered them out and continued, it wouldn't be the same number any more and could match other converted hexadecimal numbers.

Edited January 17, 2010 by Guest

Sign In

Confusing Find Results

Recommended Posts

oilcan

Matthew F

LaRetta

oilcan

michael3785

IdealData

comment

michael3785

LaRetta

comment

LaRetta

comment

LaRetta

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information