How to perform Find with partial data (text)?

K3lso · February 16, 2005

For exemple: I

comment · February 17, 2005

Can you define "55% of similar match"? Or at least give a more detailed description - preferably with an example?

K3lso · February 17, 2005

I want to perform a search that gives a partial match. Exemple: I type in the search field the word "KANGAROO". If it all goes well the search should give these results that are already in my records: "KANGO", "ANGAR", "KAROO" and all words that are similar to "KANGAROO".

I mentioned the percentage match because I have a database in foxpro and when you perform a search, in the search field you can select the % of similarity between the word you type and the words in your record.

I hope this can help you.

BobWeaver · February 17, 2005

You could try something like this to simplify the text in both the trademark field, and the search field:

Substitute(OriginalText;

["E";"A"];["I";"A"];

["O";"A"];["U";"A"];

["B";"V"];["W";"V"];

["D";"T"];["G";"K"];

["Z";"S"];["CH";"J"];

["C";"K"])

Then search the simplified trademark field for the simplified search text.

K3lso · February 17, 2005

Sorry but I don

BobWeaver · February 17, 2005

Assuming that your trademark field is called "Trademark" then:

Make a new calculated field "SimplifiedTrademark" with the formula:

Substitute(Trademark;

["E";"A"];["I";"A"];

["O";"A"];["U";"A"];

["B";"V"];["W";"V"];

["D";"T"];["G";"K"];

["Z";"S"];["CH";"J"];

["C";"K"])

Then make a global field "SearchText" where the user will enter the search text. And then make another calculated field "SimplifiedSearchText" with the formula:

Substitute(SearchText;

["E";"A"];["I";"A"];

["O";"A"];["U";"A"];

["B";"V"];["W";"V"];

["D";"T"];["G";"K"];

["Z";"S"];["CH";"J"];

["C";"K"])

Then you can make a simple find script

Enter Find Mode

Set Field [simplifiedTrademark, SimplifiedSearchText]

Perform Find []

So, the user enters the new proposed trademark into the SearchText field and runs the script. Similar text, if any, will be returned in the found set.

What the formula is doing is replacing all vowels with "A" and reducing the number of different consonants with a smaller set of similar sounding ones. So there is a better chance of matching words that are not exactly the same.

comment · February 17, 2005

The problem is not (yet) how, but what. I still don't know what "similar" means, and certainly not how to measure similarity. IMHO, this is a much more complex subject than what would seem at first glance (do a Google on "partial string matching" to see what I mean).

If you know the algorithm that your FOXPRO database uses, then perhaps we can help you translate that to Filemakerese. I suspect it will be mighty slow, but who knows.

Bob, I don't think a phonetic match is what is required here. In any case, take a look at Soundex and Metaphone for a bit more sophisticated method.

BobWeaver · February 17, 2005

Coincidentally, while you were posting, I was looking at the Soundex algorithm and I made a Filemaker implementation of it. Here it is:

Left(Left(Upper(TextField);1)&Middle(Substitute ( "." & Upper(TextField);

["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"];

["F";"B"];["P";"B"];["V";"B"];

["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"];

["T";"D"];["N";"M"];

["AAA";"A"];["AA";"A"];["AA";"A"];

["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"];

["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"];

["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"];

["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"];

["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"];

["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"];

[".A";".0"];["A";""];[".";""]

);2;1000)&"000000";4)

This is far more practical than the formula I posted above.

My assumption was that K3lso was looking for a phonetic match since I can't think of any other practical search method.

comment · February 17, 2005

I can think of several other methods, and I know there are many more that I can't think of, but others have. Don't know how practical any of them would be in FMP.

For an example ot two:

http://www.codeproject.com/string/dmetaphone6.asp

http://www.merriampark.com/ld.htm

K3lso · February 17, 2005

Yup, Im looking for a phonetic match. I

K3lso · February 17, 2005

I cant make it. Im newbie in this and I dont understand very well. Should I make a relationship between my TRADEMARK field and SIMPLIFIED TRADEMARK field? and "SEARCH" field with "SIMPLIFIED SEARCH"?

The thing is: I have in one window my TRADEMARK(A) field, where are stored all my NEW trademark records; and in another window my TRADEMARK(: field, where are stored all my OLD trademark records. I want to create (in another window) a single field where when I enter a new proposed trademark and when I press a button called "SEARCH", it performs a phonetic search, so that similar text (from the TRADEMARK(A) record and the TRADEMARK(: record at the same time) are shown in a new window. Could it be possible? If so, I will appreciate very much a detailed explanation. Thnx in advance.

BobWeaver · February 17, 2005

You can make a relationship from the simplified search field to the simplified trademark field, and then you can either place a portal from that relationship on your layout to view the similar records, or use a GTRR [show] (go to related record, show only related) in a script to bring up a found set of similar records.

K3lso · February 17, 2005

*******. I still can

-Queue- · February 17, 2005

Don't give up! Post your file and we'll help fix it.

K3lso · February 17, 2005

Your Soundex algorithm implementation you made it

K3lso · February 17, 2005

Bob, your Soundex algorithm implementation you made it

-Queue- · February 17, 2005

The calculated field won't work in Find Mode. So you'll have to set the global with its calculated result before entering Find Mode.

BobWeaver · February 18, 2005

Here's a sample file that uses a relationship and a portal to get the search results

Soundex.fp7.zip

BobWeaver · February 18, 2005

Soundex may work much better for this application than for what it was originally intended. As a fuzzy phonetic name search algorithm, it returns many false positive matches, which for this application is likely a good thing. On the other hand, because the first character must match exactly, it won't return a word like cat when you search for kat. This can be easily fixed by changing the formula to the following:

Left(Substitute ( "." & Upper(TextField);

["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"];

["F";"B"];["P";"B"];["V";"B"];

["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"];

["T";"D"];["N";"M"];

["AAA";"A"];["AA";"A"];["AA";"A"];

["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"];

["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"];

["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"];

["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"];

["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"];

["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"];

[".A";".0"];["A";""];[".";""]

)&"0000";4)

K3lso · February 18, 2005

Bob, how do u perform the search in the "SearchText" field in you Soundex??? If I enter a word what next?

K3lso · February 18, 2005

Got it, but the new formula doesn

K3lso · February 18, 2005

sorry, my mistake. the formula works. When I search for "KAT", it returns "CAT".. But when I search for the word "STAROX", it won

K3lso · February 18, 2005

Someone told me to use some kind of permutation script that would generate all possibilities of matches. By now Im so confused.

P.S. I already have that searchscript (PHPscript) in a Foxpro database. Is there a way so I can somehow see its algorithms? Or a way I can use the same PHPscript in Filemaker?

K3lso · February 18, 2005

Maybe this could help you

permutations.zip

comment · February 18, 2005

It seems we are going in circles. "STAROX"and "ATAROX" are not a phonetic match. But you said you were looking for a phonetic match.

Someone told me to use some kind of permutation script

That would be my instinct, too. The problem is: what kind of permutations? The attached file is good for playing word-games. Do you really want PARTS and STRAP to be considered similar, just because they use the same letters?

I already have that searchscript (PHPscript) in a Foxpro database. Is there a way so I can somehow see its algorithms?

I don't know: is there a way so we can somehow see the script?

K3lso · February 19, 2005

Im looking for a phonetic match AND similar written words. For me is useless if the search returns "CAT" from "KAT" but not the word "RAT". They all are similar "CAT" and "KAT" is a phonetic result; "CAT" and "RAT" not but is almost the same.

That

BobWeaver · February 19, 2005

Have you used the Foxpro script? And if so, does it work the way you want?

<<Edited: Oops, I went back and re-read your earlier post. The question is already answered>>

I wonder if the Foxpro algorithm is a q-gram type of match?

comment · February 19, 2005

I believe that if CAT and RAT are a match, then we are NOT after a phonetic match. This non-phonetic match should also match CAT with KAT, but the fact that CAT and KAT also happen to be a phonetic match is pure coincidence.

A search of common characters per se is IMHO quite useless, as I have already demonstrated with PARTS and STRAP (100% match).

Bob: q-grams and "edit-distance" algorithms were my two prime suspects from the very start.

K3lso · February 19, 2005

I know its kinda confused, but the thing is that I work with this and for me is really important to search if a new trademark is similar in the prononciation (phonetic match, e.g.: DOG with DAWG) or if there is a coincidence between common characters (written match, e.g.: STAROX with ATAROX).

The soundex algorithm you posted it

BobWeaver · February 19, 2005

Bob: q-grams and "edit-distance" algorithms were my two prime suspects from the very start.

Yes, but I can be a bit slow sometimes

Honestly, I wasn't really trying to suggest that Soundex was the ultimate solution, but it's probably the easiest one to implement in Filemaker.

K3lso, you can modify the function to ignore the first character (see below) and you can try it, but that's not to say that it won't have undesired side effects. You will likely encounter other problems too. Another web site gives the example where it fails to match "Leighton" and "Layton." So, it has lots of flaws.

If you want to keep playing with soundex, this variation will ignore the first character:

Left(Substitute ( "." & Middle(Upper(TextField);2;1000);

["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"];

["F";"B"];["P";"B"];["V";"B"];

["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"];

["T";"D"];["N";"M"];

["AAA";"A"];["AA";"A"];["AA";"A"];

["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"];

["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"];

["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"];

["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"];

["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"];

["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"];

[".A";".0"];["A";""];[".";""]

)&"0000";4)

... but it won't work for words that start with two consonants like St, Ch, Th, Tr, etc.

comment · February 19, 2005

K3lso, I understand what you're saying. Here's the situation:

Phonetic matching is more or less easy, as you saw. As Bob correctly points out, it is not fool-proof. There are numerous variations and improvements on the scheme, but it is difficult to say which one is best for you. I don't think anyone has done statistics specifically on trademarks. Most of these were fine-tuned for people's names. (As an aside - I wouldn't try to alter the algorithm "on a hunch". It really needs stats to analyze the impact.)

Now the other type of matching is a different beast altogether. It is extremely complex, and nearly impossible to implement in Filemaker. Certainly not very practical. IMHO, the method you're after is the one called Levenshtein Distance (LD). I even found a site that has the FoxPro implementation of that.

Roughly speaking, LD is is the number of deletions, insertions, or substitutions required to transform one string into another. Let's consider what it means.

The user inputs a new string. In order to compute the LD, you have to compare every single character of the new string with every single character of every one of the existing strings. Filemaker sadly lacks in real iterative calculations, but in Dev7 you could perhaps get by with a custom function. It would probably take me a full week to write such a function (a smarter person might do it in less). But the bottom line is, even if it worked, it would most likely be sssooo ssssllllloooowwwwww as to make it practically useless.

K3lso · February 19, 2005

I understad. That

comment · February 19, 2005

Is a lot easier to build that kind of search script for foxpro?

I am afraid that is so. Filemaker is really lame when it comes to arrays and iterative calcs - two essentials for computing LD. Come to think of it, I don't see how even a custom function would deal with this: each iteration needs the results of three previous iterations. The only way to do this, I think, is to script every step of process, and that means slowing it down even more.

OTOH, since I now recall that your FoxPro function returns similarity in %, it might be that I have found it - and it is not LD.

I really can't spend the time right now, but if anyone cares to tackle this, here is the FoxPro code:

http://rumkin.com/reference/algorithms/fuzzy_strings/oliver.prg.txt

Should I have two search scripts? a phonetic one and another based on LD?

Tough to say - it's a matter of cost/benefit. (Should I have a Rolls-Royce? Of course I should.) If we're talking ideally, then you should hire a team of semanticians and computer theorists and make a research project out of it. I am not kidding - we're bordering on AI here. Consider this: for almost any purpose, changing one character in a string to another should produce the same result - no matter which character. But for your purpose, I believe, SFIKO is closer to SEIKO than SQIKO. Moreover, ROLECKS should be closer to ROLEX than ROLER, even though character-wise ROLECKS is a greater change.

K3lso · February 20, 2005

Sounds like a whole week in front of the computer. hmmm.. I think I should have studied this instead of going to law school.. hahaha...

I

K3lso · February 20, 2005

THank you comment. That algorithm u find may be helpful.

Anybody wanna try with that in FM?

Sign In

How to perform Find with partial data (text)?

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Important Information