Skip to content
View in the app

A better way to browse. Learn more.

FMForums.com

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Fuzzy string comparison to find duplicates based on text strings that could vary slightly?

Featured Replies

Hi there,

I am trying to clean up a database of journal articles. I am particularly

interested in identifying potential duplicates based on comparing the title of the article.

I am thinking that a calculation could compare two text strings and return a "%similar" value,

so that the following two strings would be flagged as a potential duplicate.

"Deep-inspiration breath-hold PET/CT of the thorax"

"Deep-inspiration breath-hold PET-CT of the thorax"

As would these:

"∆p in the thorax precipitates asynchromatic sarcoma"

"delta-p in the thorax precipitates asynchromatic sarcoma"

however this would get a low % similar rating:

"Alpha beta in the left quadrant"

"Geriatric sarcoma unhinged"

I've looked around and have not found a solution for this in the FMForums nor have a found a custom function.. though, I suspect this problem has been addressed by someone before.

Does anyone know of a method I might be able to employ?

Thanks in advance for any information you can provide.

:)

One way would to be to Sub out the character like


Substitute ( Title ;

[ "/" ; " " ] ;

[ "-" ; " " ]

)

and then search using the operator for duplications !

Lee

It seems that in your case "similarity" would increase in proportion to the number of words common to both titles?

Does anyone know of a method I might be able to employ?

Fuzzy String Comparison's can be handled with regular expressions. Few different plugin options for doing it. Wikipedia Article has some places to start, ScriptMaster or bBox plugin and many others would give you the ability to leverage reg-ex. I personally have not gone to this degree with Filemaker, but have done some similar things using Ruby.

you may need to explore this and the associated wiki articles. http://www.briandunning.com/cf/965

  • Author

Thank you all.. the Levenshtein calculation on Brian Dunning's page seems to be something along what I had in mind.. LOL.. I would have never searched for "Levenshtein" :)

Thanks for all the useful feedback.. I will see what I can come up with and may post it back here for review.

  • Newbies

My first thought was the Levenshtein calculation too. But, I've done this manually before (and prefer the manual process in most situation). Using "Insert from Index" in find mode, and "Insert from Index" and "Replace All" on the found set, you can get through them pretty quickly (as long you only have a couple hundred strings to merge), plus you have human intelligence doing the matching.

Just think twice each time before you click the "Replace All" button.

Create an account or sign in to comment

Important Information

By using this site, you agree to our Terms of Use.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.