barkingsheltie Posted June 20, 2007 Posted June 20, 2007 When there is not an exact match for a find, is it possible to do a 'fuzzy' search, say something like a Levenshtein edit distance threshold on alphanumeric strings? I have an application/need, a 'cross-ref' list which has connector industry part #'s that are cross-ref to our products. Often a call/request will not have an exact match for the part number, e.g., M39029-13 and the correct part number is M39029/13 (its not just misplaced '/', '-', '', etc, but also situations like, MS24256A requested, but actual is MS24256B. These examples had the differences tacked onto the end, but it could be the prefix as well, customer gives 240-4578B, and the correct is 24-4578B. No found match, would like to find similar matches using something like the Levenshtein method, or other. I assume I need to write a custom function to do this? shelley
comment Posted June 20, 2007 Posted June 20, 2007 You cannot search based on Levenshtein distance, because Levenshtein distance measures the similarity between two strings - and one of these string is your search criteria. You would need to compute the distance of every record from the given string every time a new search string is requested. However, there are other methods. For example, your first two cases would be covered by converting the search string into a number, and searching in a field of similarly converted part number. The third one is more difficult, but since the part numbers appear to be short, you could perhaps devise some mask system to get a reasonable amount of matches. Here are two related threads: http://fmforums.com/forum/showtopic.php?tid/171912 http://fmforums.com/forum/showtopic.php?tid/172518/
barkingsheltie Posted June 21, 2007 Author Posted June 21, 2007 Thanks for the tips, including the links. I will take a closer look and see if these will work for me. Converting to a number (I assume GetAsNumber(data) is the function you are thinking of). This will at least produce some matches - given that often its the spaces, and formatting characters, /,-, etc). At the risk of asking for too much indulgence - the little I know of php and/or c, would be to copy the contents to a temp array, while using a function to strip out all the non number, alpha characters, and then something like the Levenshtein method, or a lame concoction of my own construction (a character by character comparison within a loop). I have been thinking of moving this database over to mysql and php for this very reason. Since it would appear that the functions and methods are more conducive. It may just be my lack of how filemaker can do things.
comment Posted June 21, 2007 Posted June 21, 2007 You could use GetAsNumber() or Filter(), the difference being that the latter will preserve leading zeros (if result is Text). I am afraid I don't know enough about SQL/PHP (or even Filemaker) to make a comparison. I believe Filemaker's basic paradigm is different, which means that if you approach it with a SQL frame of mind, your solutions will be very inefficient.
barkingsheltie Posted June 21, 2007 Author Posted June 21, 2007 Thanks again. Your suggestions have put me on track to at least try these out in filemaker. Its not that hard to test things out for the php/mysql method as well, and then hopefully the *best* solution will be somewhat clear.
comment Posted June 21, 2007 Posted June 21, 2007 Indeed. But you should compare other aspects as well. Filemaker's strongest point, I think, is ease of use (which also implies quick and low-cost setup and maintenance). Data-crunching, on the other hand....
Recommended Posts
This topic is 6422 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now