Jump to content

Check for Latin + Non-Latin mixed text


panchristo
 Share

This topic is 4386 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi, I've been looking for a way to check (via a calculation) whether a text string contains any non-latin characters.

Specifically I'm interested in checking whether text string contains characters only between code(902)-HEX386 AND code(974)-HEX3CE of the Unicode set (greek language), or others too.

The string is not supposed to be large (<100 chars) in length so recursion limits will not be an issue I think.

I only want to have a "switch" turned to (1) when mixed characters are typed (Greek + others).

Can anyone help with building a function for this scope? (I'm really bad at this)

Link to comment
Share on other sites

I am not sure the problem is well defined. You can easily check if the text contains ONLY certain characters, e.g. if =

Exact ( text ; Filter ( text ; "αβγ...Ω" ) )

returns true, then text contains no other characters. However, if the test returns false, the "other characters" could be anything that isn't listed explicitly in the filterText parameter - in the above example, that would include digits, punctuation marks, currency symbols etc.

Link to comment
Share on other sites

I am not sure the problem is well defined. You can easily check if the text contains ONLY certain characters, e.g. if =

Exact ( text ; Filter ( text ; "αβγ...Ω" ) )

returns true, then text contains no other characters. However, if the test returns false, the "other characters" could be anything that isn't listed explicitly in the filterText parameter - in the above example, that would include digits, punctuation marks, currency symbols etc.

Well the problem is that it may contain "abc", "αβγ" as well as "abcdαβγδ". I want to check whether it contains mixed Latin and Non-Latin characters or not (i.e. TRUE= case #3 only).

Hypothetically, I could use two expressions (one for latin and one for non-latin chars) like yours side-by-side checking for both, but then it sounds error-prone to hard-code/define the complete unicode set, isn't it?

Therefore, I thought a recursive function which checks separately if each character falls in a specified code range would do the job.

But then I can't write it myself :B :B :B

Link to comment
Share on other sites

Such custom function would face the same issue: in which category does "αβγ123" fall? In any case, I don't see a functional difference between defining characters by range or by listing them. Performance-wise, I'd guess the Filter() function will be faster.

Link to comment
Share on other sites

That would be TRUE (mixed characters),

whereas:

"abc" or "αβγ" is FALSE (not mixed)

Why not use the Filter function to a second field for the Not exceptable text and have it show a warning or something?

HTH

Lee

Link to comment
Share on other sites

I still don't get the logic here. Is "abc123" mixed? Perhaps you should explain the purpose of this exercise.

No, numbers would not matter. Hence your example would be mixed=FALSE

Why not use the Filter function to a second field for the Not exceptable text and have it show a warning or something?

HTH

Lee

The mentioned function will be used when storing inventory IDs comprised of initials and numbers (MP234, ΜΣ234a). Due to the use of Greek characters a potential mistyping of common letters in latin(e.g. A,B,M,N,O etc) could easily cause sorting/searching problems. By displaying a warn symbol or even not validating the entry I imagined I could effectively confront this. Once again, numbers don't matter, only if the text contains greek and latin characters.

mixed characters

only greek=FALSE

only latin=FALSE

greek and latin=TRUE

* numbers don't matter

Link to comment
Share on other sites

numbers don't matter

If numbers don't matter, then why is "αβγ123" mixed, but "abc123" is not?

potential mistyping of common letters in latin(e.g. A,B,M,N,O etc) could easily cause sorting/searching problems.

Isn't it more likely for someone to mistype "ΑΒ" (Greek) instead of "AB" (Latin) rather than switching the character set in the middle of typing?

Link to comment
Share on other sites

Can the Code() function be used to identify the high-ascii characters in the string? (Granted it may need a recursive custom function...) I assume here that you are looking for non-standard characters that are normally accessible only with the Option key on a Mac or the Alt key on a PC.

Link to comment
Share on other sites

1. If numbers don't matter, then why is "αβγ123" mixed, but "abc123" is not? :B

- oups - got me there...

2.Isn't it more likely for someone to mistype "ΑΒ" (Greek) instead of "AB" (Latin) rather than switching the character set in the middle of typing?

1."αβγ123" and "abc123" should not be considered mixed (numbers don't matter)

2. Data entry involves latin characters in other fields

Can the Code() function be used to identify the high-ascii characters in the string? (Granted it may need a recursive custom function...) I assume here that you are looking for non-standard characters that are normally accessible only with the Option key on a Mac or the Alt key on a PC.

If you refer to characters like - ά,έ,ί,ή,ϊ,ΐ etc, in a Greek keyboard layout these are typed almost directly (e.g. ά = ;+α), therefore, no Alt or Option key used.

Link to comment
Share on other sites

Data entry involves latin characters in other fields

Yes, but I still think it's more likely for the user to forget to switch the keyboard when entering the field - thus producing a "not mixed" entry which is still wrong, rather than switch the keyboard in the middle of typing and producing a "mixed" entry.

Anyway, I believe a "mixed" entry is true when =

not IsEmpty ( Filter ( Lower ( text ) ; "αβγ...ω" ) )

and

not IsEmpty ( Filter ( Lower ( text ) ; "abc...z" ) )

Link to comment
Share on other sites

Yes, but I still think it's more likely for the user to forget to switch the keyboard when entering the field - thus producing a "not mixed" entry which is still wrong, rather than switch the keyboard in the middle of typing and producing a "mixed" entry.

Anyway, I believe a "mixed" entry is true when =

not IsEmpty ( Filter ( Lower ( text ) ; "αβγ...ω" ) )

and

not IsEmpty ( Filter ( Lower ( text ) ; "abc...z" ) )

At a first glance, it looks like it's ok for the job. If I face any problem I'll let you know. Thanks a lot.

Link to comment
Share on other sites

This topic is 4386 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.