Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×

How to perform Find with partial data (text)?


This topic is 6855 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Posted

I want to perform a search that gives a partial match. Exemple: I type in the search field the word "KANGAROO". If it all goes well the search should give these results that are already in my records: "KANGO", "ANGAR", "KAROO" and all words that are similar to "KANGAROO".

I mentioned the percentage match because I have a database in foxpro and when you perform a search, in the search field you can select the % of similarity between the word you type and the words in your record.

I hope this can help you.

Posted

You could try something like this to simplify the text in both the trademark field, and the search field:

Substitute(OriginalText;

["E";"A"];["I";"A"];

["O";"A"];["U";"A"];

["B";"V"];["W";"V"];

["D";"T"];["G";"K"];

["Z";"S"];["CH";"J"];

["C";"K"])

Then search the simplified trademark field for the simplified search text.

Posted

Assuming that your trademark field is called "Trademark" then:

Make a new calculated field "SimplifiedTrademark" with the formula:

Substitute(Trademark;

["E";"A"];["I";"A"];

["O";"A"];["U";"A"];

["B";"V"];["W";"V"];

["D";"T"];["G";"K"];

["Z";"S"];["CH";"J"];

["C";"K"])

Then make a global field "SearchText" where the user will enter the search text. And then make another calculated field "SimplifiedSearchText" with the formula:

Substitute(SearchText;

["E";"A"];["I";"A"];

["O";"A"];["U";"A"];

["B";"V"];["W";"V"];

["D";"T"];["G";"K"];

["Z";"S"];["CH";"J"];

["C";"K"])

Then you can make a simple find script

Enter Find Mode

Set Field [simplifiedTrademark, SimplifiedSearchText]

Perform Find []

So, the user enters the new proposed trademark into the SearchText field and runs the script. Similar text, if any, will be returned in the found set.

What the formula is doing is replacing all vowels with "A" and reducing the number of different consonants with a smaller set of similar sounding ones. So there is a better chance of matching words that are not exactly the same.

Posted

The problem is not (yet) how, but what. I still don't know what "similar" means, and certainly not how to measure similarity. IMHO, this is a much more complex subject than what would seem at first glance (do a Google on "partial string matching" to see what I mean).

If you know the algorithm that your FOXPRO database uses, then perhaps we can help you translate that to Filemakerese. I suspect it will be mighty slow, but who knows.

Bob, I don't think a phonetic match is what is required here. In any case, take a look at Soundex and Metaphone for a bit more sophisticated method.

Posted

Coincidentally, while you were posting, I was looking at the Soundex algorithm and I made a Filemaker implementation of it. Here it is:

Left(Left(Upper(TextField);1)&Middle(Substitute ( "." & Upper(TextField);

["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"];

["F";"B"];["P";"B"];["V";"B"];

["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"];

["T";"D"];["N";"M"];

["AAA";"A"];["AA";"A"];["AA";"A"];

["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"];

["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"];

["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"];

["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"];

["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"];

["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"];

[".A";".0"];["A";""];[".";""]

);2;1000)&"000000";4)

This is far more practical than the formula I posted above.

My assumption was that K3lso was looking for a phonetic match since I can't think of any other practical search method.

Posted

I cant make it. Im newbie in this and I dont understand very well. Should I make a relationship between my TRADEMARK field and SIMPLIFIED TRADEMARK field? and "SEARCH" field with "SIMPLIFIED SEARCH"?

The thing is: I have in one window my TRADEMARK(A) field, where are stored all my NEW trademark records; and in another window my TRADEMARK(: field, where are stored all my OLD trademark records. I want to create (in another window) a single field where when I enter a new proposed trademark and when I press a button called "SEARCH", it performs a phonetic search, so that similar text (from the TRADEMARK(A) record and the TRADEMARK(: record at the same time) are shown in a new window. Could it be possible? If so, I will appreciate very much a detailed explanation. Thnx in advance.

Posted

You can make a relationship from the simplified search field to the simplified trademark field, and then you can either place a portal from that relationship on your layout to view the similar records, or use a GTRR [show] (go to related record, show only related) in a script to bring up a found set of similar records.

Posted

The calculated field won't work in Find Mode. So you'll have to set the global with its calculated result before entering Find Mode.

Posted

Soundex may work much better for this application than for what it was originally intended. As a fuzzy phonetic name search algorithm, it returns many false positive matches, which for this application is likely a good thing. On the other hand, because the first character must match exactly, it won't return a word like cat when you search for kat. This can be easily fixed by changing the formula to the following:

Left(Substitute ( "." & Upper(TextField);

["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"];

["F";"B"];["P";"B"];["V";"B"];

["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"];

["T";"D"];["N";"M"];

["AAA";"A"];["AA";"A"];["AA";"A"];

["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"];

["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"];

["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"];

["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"];

["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"];

["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"];

[".A";".0"];["A";""];[".";""]

)&"0000";4)

Posted

sorry, my mistake. the formula works. When I search for "KAT", it returns "CAT".. But when I search for the word "STAROX", it won

Posted

Someone told me to use some kind of permutation script that would generate all possibilities of matches. By now Im so confused.

P.S. I already have that searchscript (PHPscript) in a Foxpro database. Is there a way so I can somehow see its algorithms? Or a way I can use the same PHPscript in Filemaker?

Posted

It seems we are going in circles. "STAROX"and "ATAROX" are not a phonetic match. But you said you were looking for a phonetic match.

Someone told me to use some kind of permutation script

That would be my instinct, too. The problem is: what kind of permutations? The attached file is good for playing word-games. Do you really want PARTS and STRAP to be considered similar, just because they use the same letters?

I already have that searchscript (PHPscript) in a Foxpro database. Is there a way so I can somehow see its algorithms?

I don't know: is there a way so we can somehow see the script?

Posted

Im looking for a phonetic match AND similar written words. For me is useless if the search returns "CAT" from "KAT" but not the word "RAT". They all are similar "CAT" and "KAT" is a phonetic result; "CAT" and "RAT" not but is almost the same.

That

Posted

Have you used the Foxpro script? And if so, does it work the way you want?

<<Edited: Oops, I went back and re-read your earlier post. The question is already answered>>

I wonder if the Foxpro algorithm is a q-gram type of match?

Posted

I believe that if CAT and RAT are a match, then we are NOT after a phonetic match. This non-phonetic match should also match CAT with KAT, but the fact that CAT and KAT also happen to be a phonetic match is pure coincidence.

A search of common characters per se is IMHO quite useless, as I have already demonstrated with PARTS and STRAP (100% match).

Bob: q-grams and "edit-distance" algorithms were my two prime suspects from the very start.

Posted

I know its kinda confused, but the thing is that I work with this and for me is really important to search if a new trademark is similar in the prononciation (phonetic match, e.g.: DOG with DAWG) or if there is a coincidence between common characters (written match, e.g.: STAROX with ATAROX).

The soundex algorithm you posted it

Posted

Bob: q-grams and "edit-distance" algorithms were my two prime suspects from the very start.

Yes, but I can be a bit slow sometimes smile.gif

Honestly, I wasn't really trying to suggest that Soundex was the ultimate solution, but it's probably the easiest one to implement in Filemaker.

K3lso, you can modify the function to ignore the first character (see below) and you can try it, but that's not to say that it won't have undesired side effects. You will likely encounter other problems too. Another web site gives the example where it fails to match "Leighton" and "Layton." So, it has lots of flaws.

If you want to keep playing with soundex, this variation will ignore the first character:

Left(Substitute ( "." & Middle(Upper(TextField);2;1000);

["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"];

["F";"B"];["P";"B"];["V";"B"];

["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"];

["T";"D"];["N";"M"];

["AAA";"A"];["AA";"A"];["AA";"A"];

["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"];

["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"];

["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"];

["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"];

["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"];

["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"];

[".A";".0"];["A";""];[".";""]

)&"0000";4)

... but it won't work for words that start with two consonants like St, Ch, Th, Tr, etc.

Posted

K3lso, I understand what you're saying. Here's the situation:

Phonetic matching is more or less easy, as you saw. As Bob correctly points out, it is not fool-proof. There are numerous variations and improvements on the scheme, but it is difficult to say which one is best for you. I don't think anyone has done statistics specifically on trademarks. Most of these were fine-tuned for people's names. (As an aside - I wouldn't try to alter the algorithm "on a hunch". It really needs stats to analyze the impact.)

Now the other type of matching is a different beast altogether. It is extremely complex, and nearly impossible to implement in Filemaker. Certainly not very practical. IMHO, the method you're after is the one called Levenshtein Distance (LD). I even found a site that has the FoxPro implementation of that.

Roughly speaking, LD is is the number of deletions, insertions, or substitutions required to transform one string into another. Let's consider what it means.

The user inputs a new string. In order to compute the LD, you have to compare every single character of the new string with every single character of every one of the existing strings. Filemaker sadly lacks in real iterative calculations, but in Dev7 you could perhaps get by with a custom function. It would probably take me a full week to write such a function (a smarter person might do it in less). But the bottom line is, even if it worked, it would most likely be sssooo ssssllllloooowwwwww as to make it practically useless.

Posted

Is a lot easier to build that kind of search script for foxpro?

I am afraid that is so. Filemaker is really lame when it comes to arrays and iterative calcs - two essentials for computing LD. Come to think of it, I don't see how even a custom function would deal with this: each iteration needs the results of three previous iterations. The only way to do this, I think, is to script every step of process, and that means slowing it down even more.

OTOH, since I now recall that your FoxPro function returns similarity in %, it might be that I have found it - and it is not LD.

I really can't spend the time right now, but if anyone cares to tackle this, here is the FoxPro code:

http://rumkin.com/reference/algorithms/fuzzy_strings/oliver.prg.txt

Should I have two search scripts? a phonetic one and another based on LD?

Tough to say - it's a matter of cost/benefit. (Should I have a Rolls-Royce? Of course I should.) If we're talking ideally, then you should hire a team of semanticians and computer theorists and make a research project out of it. I am not kidding - we're bordering on AI here. Consider this: for almost any purpose, changing one character in a string to another should produce the same result - no matter which character. But for your purpose, I believe, SFIKO is closer to SEIKO than SQIKO. Moreover, ROLECKS should be closer to ROLEX than ROLER, even though character-wise ROLECKS is a greater change.

Posted

Sounds like a whole week in front of the computer. hmmm.. I think I should have studied this instead of going to law school.. hahaha...

I

This topic is 6855 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.