Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×
The Claris Museum: The Vault of FileMaker Antiquities at Claris Engage 2025! ×

This topic is 4658 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Posted

Hi there,

I am trying to clean up a database of journal articles. I am particularly

interested in identifying potential duplicates based on comparing the title of the article.

I am thinking that a calculation could compare two text strings and return a "%similar" value,

so that the following two strings would be flagged as a potential duplicate.

"Deep-inspiration breath-hold PET/CT of the thorax"

"Deep-inspiration breath-hold PET-CT of the thorax"

As would these:

"∆p in the thorax precipitates asynchromatic sarcoma"

"delta-p in the thorax precipitates asynchromatic sarcoma"

however this would get a low % similar rating:

"Alpha beta in the left quadrant"

"Geriatric sarcoma unhinged"

I've looked around and have not found a solution for this in the FMForums nor have a found a custom function.. though, I suspect this problem has been addressed by someone before.

Does anyone know of a method I might be able to employ?

Thanks in advance for any information you can provide.

:)

Posted

One way would to be to Sub out the character like


Substitute ( Title ;

[ "/" ; " " ] ;

[ "-" ; " " ]

)

and then search using the operator for duplications !

Lee

Posted

Does anyone know of a method I might be able to employ?

Fuzzy String Comparison's can be handled with regular expressions. Few different plugin options for doing it. Wikipedia Article has some places to start, ScriptMaster or bBox plugin and many others would give you the ability to leverage reg-ex. I personally have not gone to this degree with Filemaker, but have done some similar things using Ruby.

Posted

you may need to explore this and the associated wiki articles. http://www.briandunning.com/cf/965

Posted

Thank you all.. the Levenshtein calculation on Brian Dunning's page seems to be something along what I had in mind.. LOL.. I would have never searched for "Levenshtein" :)

Thanks for all the useful feedback.. I will see what I can come up with and may post it back here for review.

  • Newbies
Posted

My first thought was the Levenshtein calculation too. But, I've done this manually before (and prefer the manual process in most situation). Using "Insert from Index" in find mode, and "Insert from Index" and "Replace All" on the found set, you can get through them pretty quickly (as long you only have a couple hundred strings to merge), plus you have human intelligence doing the matching.

Just think twice each time before you click the "Replace All" button.

This topic is 4658 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.