panchristo Posted February 2, 2011 Posted February 2, 2011 Hi, I've been looking for a way to check (via a calculation) whether a text string contains any non-latin characters. Specifically I'm interested in checking whether text string contains characters only between code(902)-HEX386 AND code(974)-HEX3CE of the Unicode set (greek language), or others too. The string is not supposed to be large (<100 chars) in length so recursion limits will not be an issue I think. I only want to have a "switch" turned to (1) when mixed characters are typed (Greek + others). Can anyone help with building a function for this scope? (I'm really bad at this)
comment Posted February 2, 2011 Posted February 2, 2011 I am not sure the problem is well defined. You can easily check if the text contains ONLY certain characters, e.g. if = Exact ( text ; Filter ( text ; "αβγ...Ω" ) ) returns true, then text contains no other characters. However, if the test returns false, the "other characters" could be anything that isn't listed explicitly in the filterText parameter - in the above example, that would include digits, punctuation marks, currency symbols etc.
panchristo Posted February 2, 2011 Author Posted February 2, 2011 I am not sure the problem is well defined. You can easily check if the text contains ONLY certain characters, e.g. if = Exact ( text ; Filter ( text ; "αβγ...Ω" ) ) returns true, then text contains no other characters. However, if the test returns false, the "other characters" could be anything that isn't listed explicitly in the filterText parameter - in the above example, that would include digits, punctuation marks, currency symbols etc. Well the problem is that it may contain "abc", "αβγ" as well as "abcdαβγδ". I want to check whether it contains mixed Latin and Non-Latin characters or not (i.e. TRUE= case #3 only). Hypothetically, I could use two expressions (one for latin and one for non-latin chars) like yours side-by-side checking for both, but then it sounds error-prone to hard-code/define the complete unicode set, isn't it? Therefore, I thought a recursive function which checks separately if each character falls in a specified code range would do the job. But then I can't write it myself :B :B
comment Posted February 2, 2011 Posted February 2, 2011 Such custom function would face the same issue: in which category does "αβγ123" fall? In any case, I don't see a functional difference between defining characters by range or by listing them. Performance-wise, I'd guess the Filter() function will be faster.
panchristo Posted February 2, 2011 Author Posted February 2, 2011 ... in which category does "αβγ123" fall? That would be TRUE (mixed characters), whereas: "abc" or "αβγ" is FALSE (not mixed)
comment Posted February 2, 2011 Posted February 2, 2011 I still don't get the logic here. Is "abc123" mixed? Perhaps you should explain the purpose of this exercise.
Lee Smith Posted February 2, 2011 Posted February 2, 2011 That would be TRUE (mixed characters), whereas: "abc" or "αβγ" is FALSE (not mixed) Why not use the Filter function to a second field for the Not exceptable text and have it show a warning or something? HTH Lee
panchristo Posted February 3, 2011 Author Posted February 3, 2011 I still don't get the logic here. Is "abc123" mixed? Perhaps you should explain the purpose of this exercise. No, numbers would not matter. Hence your example would be mixed=FALSE Why not use the Filter function to a second field for the Not exceptable text and have it show a warning or something? HTH Lee The mentioned function will be used when storing inventory IDs comprised of initials and numbers (MP234, ΜΣ234a). Due to the use of Greek characters a potential mistyping of common letters in latin(e.g. A,B,M,N,O etc) could easily cause sorting/searching problems. By displaying a warn symbol or even not validating the entry I imagined I could effectively confront this. Once again, numbers don't matter, only if the text contains greek and latin characters. mixed characters only greek=FALSE only latin=FALSE greek and latin=TRUE * numbers don't matter
comment Posted February 3, 2011 Posted February 3, 2011 numbers don't matter If numbers don't matter, then why is "αβγ123" mixed, but "abc123" is not? potential mistyping of common letters in latin(e.g. A,B,M,N,O etc) could easily cause sorting/searching problems. Isn't it more likely for someone to mistype "ΑΒ" (Greek) instead of "AB" (Latin) rather than switching the character set in the middle of typing?
Vaughan Posted February 3, 2011 Posted February 3, 2011 Can the Code() function be used to identify the high-ascii characters in the string? (Granted it may need a recursive custom function...) I assume here that you are looking for non-standard characters that are normally accessible only with the Option key on a Mac or the Alt key on a PC.
comment Posted February 3, 2011 Posted February 3, 2011 identify the high-ascii characters My dear Vaughan, you're living in the past: in the age of Unicode, Greek characters have their own block - and higher-ascii no longer serves as the common vehicle for all "other" alphabets.
panchristo Posted February 3, 2011 Author Posted February 3, 2011 1. If numbers don't matter, then why is "αβγ123" mixed, but "abc123" is not? - oups - got me there... 2.Isn't it more likely for someone to mistype "ΑΒ" (Greek) instead of "AB" (Latin) rather than switching the character set in the middle of typing? 1."αβγ123" and "abc123" should not be considered mixed (numbers don't matter) 2. Data entry involves latin characters in other fields Can the Code() function be used to identify the high-ascii characters in the string? (Granted it may need a recursive custom function...) I assume here that you are looking for non-standard characters that are normally accessible only with the Option key on a Mac or the Alt key on a PC. If you refer to characters like - ά,έ,ί,ή,ϊ,ΐ etc, in a Greek keyboard layout these are typed almost directly (e.g. ά = ;+α), therefore, no Alt or Option key used.
comment Posted February 3, 2011 Posted February 3, 2011 Data entry involves latin characters in other fields Yes, but I still think it's more likely for the user to forget to switch the keyboard when entering the field - thus producing a "not mixed" entry which is still wrong, rather than switch the keyboard in the middle of typing and producing a "mixed" entry. Anyway, I believe a "mixed" entry is true when = not IsEmpty ( Filter ( Lower ( text ) ; "αβγ...ω" ) ) and not IsEmpty ( Filter ( Lower ( text ) ; "abc...z" ) )
panchristo Posted February 3, 2011 Author Posted February 3, 2011 Yes, but I still think it's more likely for the user to forget to switch the keyboard when entering the field - thus producing a "not mixed" entry which is still wrong, rather than switch the keyboard in the middle of typing and producing a "mixed" entry. Anyway, I believe a "mixed" entry is true when = not IsEmpty ( Filter ( Lower ( text ) ; "αβγ...ω" ) ) and not IsEmpty ( Filter ( Lower ( text ) ; "abc...z" ) ) At a first glance, it looks like it's ok for the job. If I face any problem I'll let you know. Thanks a lot.
Recommended Posts
This topic is 5398 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now