February 17, 200520 yr Can you define "55% of similar match"? Or at least give a more detailed description - preferably with an example?
February 17, 200520 yr Author I want to perform a search that gives a partial match. Exemple: I type in the search field the word "KANGAROO". If it all goes well the search should give these results that are already in my records: "KANGO", "ANGAR", "KAROO" and all words that are similar to "KANGAROO". I mentioned the percentage match because I have a database in foxpro and when you perform a search, in the search field you can select the % of similarity between the word you type and the words in your record. I hope this can help you.
February 17, 200520 yr You could try something like this to simplify the text in both the trademark field, and the search field: Substitute(OriginalText; ["E";"A"];["I";"A"]; ["O";"A"];["U";"A"]; ["B";"V"];["W";"V"]; ["D";"T"];["G";"K"]; ["Z";"S"];["CH";"J"]; ["C";"K"]) Then search the simplified trademark field for the simplified search text.
February 17, 200520 yr Assuming that your trademark field is called "Trademark" then: Make a new calculated field "SimplifiedTrademark" with the formula: Substitute(Trademark; ["E";"A"];["I";"A"]; ["O";"A"];["U";"A"]; ["B";"V"];["W";"V"]; ["D";"T"];["G";"K"]; ["Z";"S"];["CH";"J"]; ["C";"K"]) Then make a global field "SearchText" where the user will enter the search text. And then make another calculated field "SimplifiedSearchText" with the formula: Substitute(SearchText; ["E";"A"];["I";"A"]; ["O";"A"];["U";"A"]; ["B";"V"];["W";"V"]; ["D";"T"];["G";"K"]; ["Z";"S"];["CH";"J"]; ["C";"K"]) Then you can make a simple find script Enter Find Mode Set Field [simplifiedTrademark, SimplifiedSearchText] Perform Find [] So, the user enters the new proposed trademark into the SearchText field and runs the script. Similar text, if any, will be returned in the found set. What the formula is doing is replacing all vowels with "A" and reducing the number of different consonants with a smaller set of similar sounding ones. So there is a better chance of matching words that are not exactly the same.
February 17, 200520 yr The problem is not (yet) how, but what. I still don't know what "similar" means, and certainly not how to measure similarity. IMHO, this is a much more complex subject than what would seem at first glance (do a Google on "partial string matching" to see what I mean). If you know the algorithm that your FOXPRO database uses, then perhaps we can help you translate that to Filemakerese. I suspect it will be mighty slow, but who knows. Bob, I don't think a phonetic match is what is required here. In any case, take a look at Soundex and Metaphone for a bit more sophisticated method.
February 17, 200520 yr Coincidentally, while you were posting, I was looking at the Soundex algorithm and I made a Filemaker implementation of it. Here it is: Left(Left(Upper(TextField);1)&Middle(Substitute ( "." & Upper(TextField); ["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"]; ["F";"B"];["P";"B"];["V";"B"]; ["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"]; ["T";"D"];["N";"M"]; ["AAA";"A"];["AA";"A"];["AA";"A"]; ["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"]; ["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"]; ["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"]; ["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"]; ["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"]; ["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"]; [".A";".0"];["A";""];[".";""] );2;1000)&"000000";4) This is far more practical than the formula I posted above. My assumption was that K3lso was looking for a phonetic match since I can't think of any other practical search method.
February 17, 200520 yr I can think of several other methods, and I know there are many more that I can't think of, but others have. Don't know how practical any of them would be in FMP. For an example ot two: http://www.codeproject.com/string/dmetaphone6.asp http://www.merriampark.com/ld.htm
February 17, 200520 yr Author I cant make it. Im newbie in this and I dont understand very well. Should I make a relationship between my TRADEMARK field and SIMPLIFIED TRADEMARK field? and "SEARCH" field with "SIMPLIFIED SEARCH"? The thing is: I have in one window my TRADEMARK(A) field, where are stored all my NEW trademark records; and in another window my TRADEMARK(: field, where are stored all my OLD trademark records. I want to create (in another window) a single field where when I enter a new proposed trademark and when I press a button called "SEARCH", it performs a phonetic search, so that similar text (from the TRADEMARK(A) record and the TRADEMARK(: record at the same time) are shown in a new window. Could it be possible? If so, I will appreciate very much a detailed explanation. Thnx in advance.
February 17, 200520 yr You can make a relationship from the simplified search field to the simplified trademark field, and then you can either place a portal from that relationship on your layout to view the similar records, or use a GTRR [show] (go to related record, show only related) in a script to bring up a found set of similar records.
February 17, 200520 yr The calculated field won't work in Find Mode. So you'll have to set the global with its calculated result before entering Find Mode.
February 18, 200520 yr Here's a sample file that uses a relationship and a portal to get the search results Soundex.fp7.zip
February 18, 200520 yr Soundex may work much better for this application than for what it was originally intended. As a fuzzy phonetic name search algorithm, it returns many false positive matches, which for this application is likely a good thing. On the other hand, because the first character must match exactly, it won't return a word like cat when you search for kat. This can be easily fixed by changing the formula to the following: Left(Substitute ( "." & Upper(TextField); ["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"]; ["F";"B"];["P";"B"];["V";"B"]; ["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"]; ["T";"D"];["N";"M"]; ["AAA";"A"];["AA";"A"];["AA";"A"]; ["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"]; ["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"]; ["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"]; ["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"]; ["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"]; ["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"]; [".A";".0"];["A";""];[".";""] )&"0000";4)
February 18, 200520 yr Author Bob, how do u perform the search in the "SearchText" field in you Soundex??? If I enter a word what next?
February 18, 200520 yr Author sorry, my mistake. the formula works. When I search for "KAT", it returns "CAT".. But when I search for the word "STAROX", it won
February 18, 200520 yr Author Someone told me to use some kind of permutation script that would generate all possibilities of matches. By now Im so confused. P.S. I already have that searchscript (PHPscript) in a Foxpro database. Is there a way so I can somehow see its algorithms? Or a way I can use the same PHPscript in Filemaker?
February 18, 200520 yr It seems we are going in circles. "STAROX"and "ATAROX" are not a phonetic match. But you said you were looking for a phonetic match. Someone told me to use some kind of permutation script That would be my instinct, too. The problem is: what kind of permutations? The attached file is good for playing word-games. Do you really want PARTS and STRAP to be considered similar, just because they use the same letters? I already have that searchscript (PHPscript) in a Foxpro database. Is there a way so I can somehow see its algorithms? I don't know: is there a way so we can somehow see the script?
February 19, 200520 yr Author Im looking for a phonetic match AND similar written words. For me is useless if the search returns "CAT" from "KAT" but not the word "RAT". They all are similar "CAT" and "KAT" is a phonetic result; "CAT" and "RAT" not but is almost the same. That
February 19, 200520 yr Have you used the Foxpro script? And if so, does it work the way you want? <<Edited: Oops, I went back and re-read your earlier post. The question is already answered>> I wonder if the Foxpro algorithm is a q-gram type of match?
February 19, 200520 yr I believe that if CAT and RAT are a match, then we are NOT after a phonetic match. This non-phonetic match should also match CAT with KAT, but the fact that CAT and KAT also happen to be a phonetic match is pure coincidence. A search of common characters per se is IMHO quite useless, as I have already demonstrated with PARTS and STRAP (100% match). Bob: q-grams and "edit-distance" algorithms were my two prime suspects from the very start.
February 19, 200520 yr Author I know its kinda confused, but the thing is that I work with this and for me is really important to search if a new trademark is similar in the prononciation (phonetic match, e.g.: DOG with DAWG) or if there is a coincidence between common characters (written match, e.g.: STAROX with ATAROX). The soundex algorithm you posted it
February 19, 200520 yr Bob: q-grams and "edit-distance" algorithms were my two prime suspects from the very start. Yes, but I can be a bit slow sometimes Honestly, I wasn't really trying to suggest that Soundex was the ultimate solution, but it's probably the easiest one to implement in Filemaker. K3lso, you can modify the function to ignore the first character (see below) and you can try it, but that's not to say that it won't have undesired side effects. You will likely encounter other problems too. Another web site gives the example where it fails to match "Leighton" and "Layton." So, it has lots of flaws. If you want to keep playing with soundex, this variation will ignore the first character: Left(Substitute ( "." & Middle(Upper(TextField);2;1000); ["E";"A"];["I";"A"];["O";"A"];["U";"A"];["Y";"A"];["H";"A"];["W";"A"]; ["F";"B"];["P";"B"];["V";"B"]; ["G";"C"];["J";"C"];["K";"C"];["Q";"C"];["S";"C"];["X";"C"];["Z";"C"]; ["T";"D"];["N";"M"]; ["AAA";"A"];["AA";"A"];["AA";"A"]; ["BBB";"B"];["BB";"B"];["BB";"B"];["B";"1"]; ["CCC";"C"];["CC";"C"];["CC";"C"];["C";"2"]; ["DDD";"D"];["DD";"D"];["DD";"D"];["D";"3"]; ["LLL";"L"];["LL";"L"];["LL";"L"];["L";"4"]; ["MMM";"M"];["MM";"M"];["MM";"M"];["M";"5"]; ["RRR";"R"];["RR";"R"];["RR";"R"];["R";"6"]; [".A";".0"];["A";""];[".";""] )&"0000";4) ... but it won't work for words that start with two consonants like St, Ch, Th, Tr, etc.
February 19, 200520 yr K3lso, I understand what you're saying. Here's the situation: Phonetic matching is more or less easy, as you saw. As Bob correctly points out, it is not fool-proof. There are numerous variations and improvements on the scheme, but it is difficult to say which one is best for you. I don't think anyone has done statistics specifically on trademarks. Most of these were fine-tuned for people's names. (As an aside - I wouldn't try to alter the algorithm "on a hunch". It really needs stats to analyze the impact.) Now the other type of matching is a different beast altogether. It is extremely complex, and nearly impossible to implement in Filemaker. Certainly not very practical. IMHO, the method you're after is the one called Levenshtein Distance (LD). I even found a site that has the FoxPro implementation of that. Roughly speaking, LD is is the number of deletions, insertions, or substitutions required to transform one string into another. Let's consider what it means. The user inputs a new string. In order to compute the LD, you have to compare every single character of the new string with every single character of every one of the existing strings. Filemaker sadly lacks in real iterative calculations, but in Dev7 you could perhaps get by with a custom function. It would probably take me a full week to write such a function (a smarter person might do it in less). But the bottom line is, even if it worked, it would most likely be sssooo ssssllllloooowwwwww as to make it practically useless.
February 19, 200520 yr Is a lot easier to build that kind of search script for foxpro? I am afraid that is so. Filemaker is really lame when it comes to arrays and iterative calcs - two essentials for computing LD. Come to think of it, I don't see how even a custom function would deal with this: each iteration needs the results of three previous iterations. The only way to do this, I think, is to script every step of process, and that means slowing it down even more. OTOH, since I now recall that your FoxPro function returns similarity in %, it might be that I have found it - and it is not LD. I really can't spend the time right now, but if anyone cares to tackle this, here is the FoxPro code: http://rumkin.com/reference/algorithms/fuzzy_strings/oliver.prg.txt Should I have two search scripts? a phonetic one and another based on LD? Tough to say - it's a matter of cost/benefit. (Should I have a Rolls-Royce? Of course I should.) If we're talking ideally, then you should hire a team of semanticians and computer theorists and make a research project out of it. I am not kidding - we're bordering on AI here. Consider this: for almost any purpose, changing one character in a string to another should produce the same result - no matter which character. But for your purpose, I believe, SFIKO is closer to SEIKO than SQIKO. Moreover, ROLECKS should be closer to ROLEX than ROLER, even though character-wise ROLECKS is a greater change.
February 20, 200520 yr Author Sounds like a whole week in front of the computer. hmmm.. I think I should have studied this instead of going to law school.. hahaha... I
February 20, 200520 yr Author THank you comment. That algorithm u find may be helpful. Anybody wanna try with that in FM?
February 25, 200520 yr It can be translated, but it won't be fast. Because it calculates a match value between the search text and every trademark in the database, it must perform the calculation for every record, every time you enter a new search text value. So, if you have 100,000 records in the file, and you enter a new search text value, It must perform 100,000 calculations, and then index them so it can find the best matches. That's the main difference between these algorithms and algorithms like soundex. Soundex type algorithms calculate an index value dependent only on the single trademark text. So, it only has to be calculated once, and can then be indexed just once. Hence it is fast.
February 25, 200520 yr I've been quietly fascinated by this discussion, first by Bob's introduction of SoundEx which I encountered ages ago, and then Comment provided links to a couple of interesting sites. I was intrigued by the Levenshtein Distance, although I thought at the time it was a wee off-topic. Now I'm not so sure. Perhaps the solution to this problem lies in a combination of Levenshtein Distance AND SoundEx. Between the two, you may be able to develop a score that will satisfy 79% of lawyers. That said, there will always be somebody who thinks Starox = Atrox = Starbucks = flummox, so there's truly no such thing as a perfect solution. Further, I'll add that I just came off a project that required studying etymology and how words evolved, and I can clearly see it illustrated in the given explications. For example, the letters K, G, and Q evolved from the letter C. The letter J is related to G, but it is also a cognate of I and Y. You guys have gone above and beyond and you're probably exhausted, but I applaud the hard work and let you know it was appreciated beyond just the contributors in the left column.
February 25, 200520 yr Author "there will always be somebody who thinks Starox = Atrox = Starbucks = flummox" "Perhaps the solution to this problem lies in a combination of Levenshtein Distance AND SoundEx" Exactly. What Im doing now is to separate the syllables (e.g. the word STAROX: into STAR and OX. Therefore, I perform a common search like this: *STAR* and *OX* and it
February 25, 200520 yr Is it possible to modify the algorith to perform a phonetic search no matter where the "SURE" is (beggining, middle or end)?? No, not with soundex. That's where these other algorithms come into play. If no one else wants to take up the challenge, I will have a look at the oliver code this weekend and see what I can do with it.
February 25, 200520 yr My first problem is I can't read the FoxPro code too well. Maybe if someone translated it to English, I could get to the second problem...
March 1, 200520 yr No, but maybe some of the solutions discussed here could be relevant to your problem.
March 1, 200520 yr I had a brief look at the code for the Similar_Text function. That's as far as I got because I suddenly realized that today is Feb 28, and I had a pile of month end and year end book work to get done. So, that took care of my week end. However, from my brief look, I noticed that the Similar_Text function is recursive, so that throws yet another twist into the mix. I did some more reading up on the function, and a bit more thinking about it. Similar_Text is the most computationally inefficient of any of the algorithms described in the resources I searched. Even if it could be made to run fast (which I'm sure it can't), I don't think Similar_Text is very well suited for what you are doing. It looks to me like it was designed to detect plagiarism, rather than detecting similar sounding or similar looking words. But, that has given me an idea about a variation on the metaphone function that may be more suitable and would be lightning fast. Give me a couple of days though. I'm busier than a lint picker in a blue serge suit factory.
March 3, 200520 yr Author While surfing I found out about Panorama database. It says: "Even though the indexes are large they actually don't contain all of the information in the database (most of the index space is taken up with hints to make searching faster). Since FileMaker is searching the index, not the database, this means that many useful search queries are impossible. Since Panorama doesn't use indexes it can perform any search you can think of, including PHONETIC SEARCHES (sounds like "alan"), PARTIAL MATCHES, comparisons between fields (Price is more than twice the P/E ratio), searching for fields that contain only letters, only numbers, or some other combination, searching all fields at once, even live keystroke-by-keystroke searches (like iTunes)." It seems its an old application, but.. what do u think about the search function?
March 3, 200520 yr Panorama is a memory resident database. So, it is limited by the amount of RAM in your computer. I used it years ago, and liked it very much. I'm surprised to hear that it's still around. You don't hear much about it nowadays. It was one of the very first Macintosh applications to come out in 1984 (under the original name OverVue), and it was way ahead of its time. I believe it was the very first application to use clairvoyance. As I recall, it had pretty good search capabilities, but it was a long time ago. Meanwhile, I came up with a variation on Soundex/Metaphone last night, which might be useful to you. Here is the formula: Let ([ Input="." & Upper(TrimAll(TextField;0;0) )& "."; Norm1=Substitute(Input; [".X";".S"];[".KN";".N"];[".PN";".N"];[".GN";".N"]; [".WR";".R"];[".WH";".W"]; ["MB.";"M."];["GH";""]; ["SCH";"SK"]; ["TIA";"SHA"];["TIO";"SHO"];["CH";"SH"]; ["CE";"SI"];["CI";"SI"];["CY";"SI"];["CK";"K"]; ["DGE";"JE"];["DGI";"JI"];["DGY";"JI"]; ["GE";"JE"];["GI";"JI"];["GY";"JI"]; ["PH";"F"];["Q";"K"];["V";"F"];["X";"KS"] ); Norm2=Upper(Substitute(Norm1; ["E";"A"];["I";"A"];["O";"A"];["U";"A"]; ["WA";"wA"];["WY";"wY"];["YA";"yA"];["Y";""];["W";""])); Norm3=Substitute(Norm2; ["G";"K"];["C";"K"]; ["TH";"S"];["J";"S"];["Z";"S"]; ["F";"B"];["P";"B"];["V";"B"]; ["H";"A"];["N";"M"];["T";"D"]; ["AAA";"A"];["AA";"A"];["AA";"A"]; ["BBB";"B"];["BB";"B"];["BB";"B"]; ["DDD";"D"];["DD";"D"];["DD";"D"]; ["KKK";"K"];["KK";"K"];["KK";"K"]; ["LLL";"L"];["LL";"L"];["LL";"L"]; ["MMM";"M"];["MM";"M"];["MM";"M"]; ["RRR";"R"];["RR";"R"];["RR";"R"]; [".A";".0"];["A";""];[".";""] ); KeyLen=Length(Norm3) ] ; Norm3& Case(KeyLen>4;
March 4, 200520 yr The search and replace text in the substitute functions will have to be changed to reflect the pronunciation rules for that language. Which language will you be using?
March 4, 200520 yr Author spanish and english. So I need to... for example.. change the letters [".WR";".R"];[".WH";".W"]; am I right?
Create an account or sign in to comment