Jump to content

Levenshtein distance between 2 words - "madonna", "modanna" = 2 ?


This topic is 5561 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hello everyone

I'd like to implement a function in Filemaker Advanced 10 would calculate the Levenshtein distance between 2 words ( the Levenshtein distance is a metric for measuring the amount of difference between two sequences).

For example, the distance between "Madonna" and "Modanna" is 2.

There is a good article on wikipedia:

http://en.wikipedia.org/wiki/Levenshtein_distance

Unfortunately, I didn't find this feature in BrianDunning and others :-(

If anyone has a solution, I'm interested :-)

Link to comment
Share on other sites

Hi Benka,

I'm surprised you didn't follow up on your other post. It seems similar (if not the same concept). Having said that, you would need a dictionary attached. And how would you know the result should be Madonna? How would you specify which words should be compared because if it compared every word then there could be hundreds of results changing each word to another.

We need to know what exactly you want to accomplish and, more importantly in this instance, why. :smile2:

LaRetta

Link to comment
Share on other sites

Hi :beertime:

I'd like to create a function working like this :

LevenshteinDistance ( "madonna";"modanna") = 2

LevenshteinDistance ( "GUMBO";"GAMBOL") = 2

LevenshteinDistance ( "ZXRZYXX";"ZXRZZZXZZ") = 4

like this :

http://www.merriampark.com/ld.htm

Edited by Guest
Link to comment
Share on other sites

Okay, but I'm stuck on a few rules:

If it were reversed, ie, "GAMBOL" ; "GUMBO" ... would it still equal 2 because there was no L on Gumbo?

"ZXRZYXX";"ZXRZZZXZZ") = 4

Please explain (in exact sequence) how this can equal 4. I come up with 5.

UPDATE: Oh, the X by position matches the second X. Forget the second question then.

Edited by Guest
Link to comment
Share on other sites

Custom Function: CompareStringErr

Parameters:

NewString

OldString

Let ( [

$posErr = $posErr + 1 ;

newChar = Middle (newString ;$posErr ; 1 ) ;

oldChar = Middle (oldString ; $posErr ; 1 ) ;

match = If ( newChar ≠ oldChar ; TextColor (newChar ; RGB (255 ; 0 ; 0 ) ) )

] ;

If ( $posErr ≤ Length ( newString ) ; match & CompareStringErr( newString ; oldString ) )

)

Then calculation (number):

Let ( $posErr = 0 ; Length ( CompareStringErr ( newString ; oldString ) ) )

Given time, I could fine-tune it further ...

UPDATE: I totally expect (and hope) that a few Masters here show us how it REALLY should look!

Edited by Guest
Added update & removed incorrect sentence
Link to comment
Share on other sites

It appears (to me) that you are simply comparing letter to letter right down the line. If so, a recursive custom function can handle it. But I'm unsure which string takes priority. For instance, you'd need to adjust the calculation portion of my CF (to account if reversing GAMBOL and GUMBO to:

Let ( [

add = Case ( Length ( oldString ) > Length ( newString ) ; Length ( oldString ) - Length ( newString ) ) ;

$posErr = 0 ] ;

Length ( CompareStringErr ( newString ;oldString) ) + add

)

Link to comment
Share on other sites

Here's my attempt. FileMaker isn't particularly well suited to this sort of thing, but anything is possible with enough trickery.

This is implemented via a set of custom functions to port the code on wikipedia to FM. I use global variable repetitions and dynamic variable names to make up the 2D matrix.

It seems speedy enough on small words, but I wouldn't suggest trying it on pages on text.... :

Levenshtein.fp7.zip

Link to comment
Share on other sites

This topic is 5561 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.