Jump to content
Server Maintenance This Week. ×

Non english characters tricky removal


This topic is 5580 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have a field that has some non english characters in some of the records, eg:

V�stervik

Café

StraÌ°åÄe

Månzfa

Sérgio

ZÌ’rich

Fürstenfeld

What I need to do is if the record has a non-english character in it to do a calculation that finds all those records, so I can manually replace the non english "é", with "e".

My original idea was to make a list of all the non english characters and do a pattern match but there are too many non english characters and no real way to know them all.

Does anyone know how to do a pattern match or some other way to find all the non english character in this field? Thanks for your help

Link to comment
Share on other sites

Try a calculation field (result is Number) =


YourField ≠ Filter ( Lower ( YourField ) ; " abcdefghijklmnopqrstuvwxyz0123456789" ) 

Search for 1 in this field.

---

NOTE: You may need to add some punctuation marks to the list of allowable characters.

Edited by Guest
Link to comment
Share on other sites

No, I don't think there's any easy way to find them using find mode.

What I would suggest is to add a temporary calculated column that removes all acceptable english chars and punctation or digits. You could then enter find mode and find all records with a non-empty "unacceptable chars" column, and either add the characters to the acceptable list or get rid of them.

The calculation would be a large Substitute statement starting out like:

     Substitute( fieldToCheck; ["A";""]; ["a";""]; ["B";""]; etc...

that removed all the characters that are acceptable.

Edit:

Yikes! Comment's calculation makes my scheme look silly. :)

Edited by Guest
Link to comment
Share on other sites

  • 7 months later...

Here's the code I created today, after reading this thread, to replace foreign characters (extended ASCII) with their logical counterparts in the English character set (standard ASCII).

Substitute ( mtitle;

["¹" ; "1"];

["²" ; "2"];

["³" ; "3"];

["Á" ; "A"];

["á" ; "a"];

["À" ; "A"];

["à" ; "a"];

["Â" ; "A"];

["â" ; "a"];

["Ä" ; "A"];

["ä" ; "a"];

["Ã" ; "A"];

["ã" ; "a"];

["Å" ; "A"];

["å" ; "a"];

["Æ" ; "AE"];

["æ" ; "ae"];

["Ç" ; "C"];

["ç" ; "c"];

["Ð" ; "D"];

["ð" ; "d"];

["É" ; "E"];

["é" ; "e"];

["È" ; "E"];

["è" ; "e"];

["Ê" ; "E"];

["ê" ; "e"];

["Ë" ; "E"];

["ë" ; "e"];

["Í" ; "I"];

["í" ; "i"];

["Ì" ; "I"];

["ì" ; "i"];

["Î" ; "I"];

["î" ; "i"];

["Ï" ; "I"];

["ï" ; "i"];

["Ñ" ; "N"];

["ñ" ; "n"];

["º" ; "o"];

["Ó" ; "O"];

["ó" ; "o"];

["Ò" ; "O"];

["ò" ; "o"];

["Ô" ; "O"];

["ô" ; "o"];

["Ö" ; "O"];

["ö" ; "o"];

["Õ" ; "O"];

["õ" ; "o"];

["Ø" ; "O"];

["ø" ; "o"];

["ß" ; "B"];

["Þ" ; "b"];

["þ" ; "b"];

["Ú" ; "U"];

["ú" ; "u"];

["Ù" ; "U"];

["ù" ; "u"];

["Û" ; "U"];

["û" ; "u"];

["Ü" ; "U"];

["ü" ; "u"];

["Ý" ; "Y"];

["ý" ; "y"];

["ÿ" ; "y"];

["™" ; ""]

)

Note that the last item in this list simply removes any "TM" symbols; others may prefer to substitute "TM" or "" instead. Some of the other substitutions are arbitrary, such as replacing the "Beta" symbol (ß) with a capital "B" and two others (Þ and þ) with a lower-case "b."

Edited by Guest
Solved my own problem, posted result
Link to comment
Share on other sites

This topic is 5580 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.