Check for Latin + Non-Latin mixed text

February 2, 201115 yr

Hi, I've been looking for a way to check (via a calculation) whether a text string contains any non-latin characters.

Specifically I'm interested in checking whether text string contains characters only between code(902)-HEX386 AND code(974)-HEX3CE of the Unicode set (greek language), or others too.

The string is not supposed to be large (<100 chars) in length so recursion limits will not be an issue I think.

I only want to have a "switch" turned to (1) when mixed characters are typed (Greek + others).

Can anyone help with building a function for this scope? (I'm really bad at this)

February 2, 201115 yr

I am not sure the problem is well defined. You can easily check if the text contains ONLY certain characters, e.g. if =

Exact ( text ; Filter ( text ; "αβγ...Ω" ) )

returns true, then text contains no other characters. However, if the test returns false, the "other characters" could be anything that isn't listed explicitly in the filterText parameter - in the above example, that would include digits, punctuation marks, currency symbols etc.

February 2, 201115 yr

Author

I am not sure the problem is well defined. You can easily check if the text contains ONLY certain characters, e.g. if =
Exact ( text ; Filter ( text ; "αβγ...Ω" ) )
returns true, then text contains no other characters. However, if the test returns false, the "other characters" could be anything that isn't listed explicitly in the filterText parameter - in the above example, that would include digits, punctuation marks, currency symbols etc.

Well the problem is that it may contain "abc", "αβγ" as well as "abcdαβγδ". I want to check whether it contains mixed Latin and Non-Latin characters or not (i.e. TRUE= case #3 only).

Hypothetically, I could use two expressions (one for latin and one for non-latin chars) like yours side-by-side checking for both, but then it sounds error-prone to hard-code/define the complete unicode set, isn't it?

Therefore, I thought a recursive function which checks separately if each character falls in a specified code range would do the job.

But then I can't write it myself :B :B

February 2, 201115 yr

Such custom function would face the same issue: in which category does "αβγ123" fall? In any case, I don't see a functional difference between defining characters by range or by listing them. Performance-wise, I'd guess the Filter() function will be faster.

February 2, 201115 yr

Author

... in which category does "αβγ123" fall?

That would be TRUE (mixed characters),

whereas:

"abc" or "αβγ" is FALSE (not mixed)

February 2, 201115 yr

I still don't get the logic here. Is "abc123" mixed? Perhaps you should explain the purpose of this exercise.

February 2, 201115 yr

That would be TRUE (mixed characters),

whereas:

"abc" or "αβγ" is FALSE (not mixed)

Why not use the Filter function to a second field for the Not exceptable text and have it show a warning or something?

HTH

Lee

February 3, 201115 yr

Author

I still don't get the logic here. Is "abc123" mixed? Perhaps you should explain the purpose of this exercise.

No, numbers would not matter. Hence your example would be mixed=FALSE

Why not use the Filter function to a second field for the Not exceptable text and have it show a warning or something?

HTH

Lee

The mentioned function will be used when storing inventory IDs comprised of initials and numbers (MP234, ΜΣ234a). Due to the use of Greek characters a potential mistyping of common letters in latin(e.g. A,B,M,N,O etc) could easily cause sorting/searching problems. By displaying a warn symbol or even not validating the entry I imagined I could effectively confront this. Once again, numbers don't matter, only if the text contains greek and latin characters.

mixed characters

only greek=FALSE

only latin=FALSE

greek and latin=TRUE

* numbers don't matter

February 3, 201115 yr

numbers don't matter

If numbers don't matter, then why is "αβγ123" mixed, but "abc123" is not?

potential mistyping of common letters in latin(e.g. A,B,M,N,O etc) could easily cause sorting/searching problems.

Isn't it more likely for someone to mistype "ΑΒ" (Greek) instead of "AB" (Latin) rather than switching the character set in the middle of typing?

February 3, 201115 yr

Can the Code() function be used to identify the high-ascii characters in the string? (Granted it may need a recursive custom function...) I assume here that you are looking for non-standard characters that are normally accessible only with the Option key on a Mac or the Alt key on a PC.

February 3, 201115 yr

identify the high-ascii characters

My dear Vaughan, you're living in the past: in the age of Unicode, Greek characters have their own block - and higher-ascii no longer serves as the common vehicle for all "other" alphabets.

February 3, 201115 yr

Author

1. If numbers don't matter, then why is "αβγ123" mixed, but "abc123" is not?

- oups - got me there...

2.Isn't it more likely for someone to mistype "ΑΒ" (Greek) instead of "AB" (Latin) rather than switching the character set in the middle of typing?

1."αβγ123" and "abc123" should not be considered mixed (numbers don't matter)

2. Data entry involves latin characters in other fields

Can the Code() function be used to identify the high-ascii characters in the string? (Granted it may need a recursive custom function...) I assume here that you are looking for non-standard characters that are normally accessible only with the Option key on a Mac or the Alt key on a PC.

If you refer to characters like - ά,έ,ί,ή,ϊ,ΐ etc, in a Greek keyboard layout these are typed almost directly (e.g. ά = ;+α), therefore, no Alt or Option key used.

February 3, 201115 yr

Data entry involves latin characters in other fields

Yes, but I still think it's more likely for the user to forget to switch the keyboard when entering the field - thus producing a "not mixed" entry which is still wrong, rather than switch the keyboard in the middle of typing and producing a "mixed" entry.

Anyway, I believe a "mixed" entry is true when =

not IsEmpty ( Filter ( Lower ( text ) ; "αβγ...ω" ) )

and

not IsEmpty ( Filter ( Lower ( text ) ; "abc...z" ) )

February 3, 201115 yr

Author

Yes, but I still think it's more likely for the user to forget to switch the keyboard when entering the field - thus producing a "not mixed" entry which is still wrong, rather than switch the keyboard in the middle of typing and producing a "mixed" entry.

Anyway, I believe a "mixed" entry is true when =
not IsEmpty ( Filter ( Lower ( text ) ; "αβγ...ω" ) )

and

not IsEmpty ( Filter ( Lower ( text ) ; "abc...z" ) )

At a first glance, it looks like it's ok for the job. If I face any problem I'll let you know. Thanks a lot.