pixi Posted September 4, 2019 Posted September 4, 2019 Does anyone have an idea how to strip everything invisible and just get the visible text in a field back? I tried various functions and still don't get why it's not possible to zap those gremlins natively or at all. And, NO! A CF containing every char to be removed is no solution. At least until EVERY SINGLE invisible char is included. Is there a list of those somewhere? @comment has posted a CF and although this solves the problem as such, it also strips out other chars.
comment Posted September 4, 2019 Posted September 4, 2019 You can either list the characters to remove, or the characters to keep. I don't think there is a third option - at least not when using Filemaker's calculation engine. If you use regex via a plugin, you can use character classes.
pixi Posted September 4, 2019 Author Posted September 4, 2019 Meanwhile I found this function: GetAsURLDecoded https://www.briandunning.com/cf/2095 When I run the GetAsURLEncoded and then the GetAsURLDecoded it gives the cleared text back. I just have to test it further for other chars. What is your opinion on that?
comment Posted September 4, 2019 Posted September 4, 2019 8 minutes ago, pixi said: What is your opinion on that? I don't know what characters you're dealing with - either on the allowed side, or on the prohibited side.
pixi Posted September 4, 2019 Author Posted September 4, 2019 The invisible ones… :o) I don't know much enough about chars to answer your question properly. I just want the - visible - text in the fields to be stored/used and not all the other - invisible - ones too. Amongst others this throws me errors when using this in e.g. Execute SQL to INSERT INTO a SQL-table. How do you define "allowed" and "prohibited"?
comment Posted September 4, 2019 Posted September 4, 2019 3 minutes ago, pixi said: How do you define "allowed" and "prohibited"? You define what's allowed by your usage. If you're using English, then you only need the lower ASCII set. If you're using XML, then you allow characters in the range specified by the XML standard. And so on. I am not sure where your invisible characters are coming from and what exactly they are. You could analyze them using the Code() function. They probably belong to a small group of control characters. As I mentioned earlier, a truly generic solution is possible only through regex. Unicode has a category of invisible control characters and unused code points, and regex should be able to use it (subject to testing with your specific implementation).
comment Posted September 4, 2019 Posted September 4, 2019 (edited) 23 hours ago, pixi said: When I run the GetAsURLEncoded and then the GetAsURLDecoded it gives the cleared text back. I made a small test of my own, starting with a text that contains a vertical tab character, Char (11). The GetAsURLEncoded() function encodes the character as " %0B". And HexDecode ( "0B" ) returns the original character. So again, I am curious exactly which characters you're dealing with. Edited September 5, 2019 by comment
pixi Posted September 4, 2019 Author Posted September 4, 2019 25 minutes ago, comment said: So again, I am curious exactly which characters you're dealing with. It started with the reversed question mark (the red one above). I don't exactly know where the hidden chars come from. The data is entered by the customers staff, either via copy & paste or typed. In this very case it is just First and Last names and Email addresses. The names are not English only, so lower ASCII doesn't work. Using "Code" gives me 0 (see screenshot below). I copied the char from a TextWrangler doc and pasted it 5 times between exclamation marks.
comment Posted September 4, 2019 Posted September 4, 2019 (edited) The reversed question mark is just a placeholder for an invisible character. It could be any character. The Code() function returning 0 is a much better indication. The null character is a well-known pest in text pasted from external sources. Unfortunately, getting rid of it is not entirely trivial, because in Filemaker Char ( 0 ) produces an empty string. Possible workarounds: paste the character into a global field; generate the character by: HexDecode ( "00" ); generate the character by: ExecuteSQL ( "SELECT DISTINCT CHR(0) FROM SomeTable" ; "" ; "" ); generate the character by: Base64Decode ( "AA==" ). Once you have the character, it's easy to substitute it out. -- P.S. I think I have discovered why your attempt to use GetAsURLEncoded() seemed to work: it's because the null characters are at the end of the string. If you place them in the middle, you will lose the text that follows them. Edited September 4, 2019 by comment 1
pixi Posted September 5, 2019 Author Posted September 5, 2019 So, if I get you right I just have to Substitute ( txt; HexDecode ( "00" ); "" ) ? That seems to be quite easy…a bit too easy maybe? The global fields retains the char, no matter if it is opened locally or from a server. Is there a way to discover/a list of all "invisible" chars? A CF to strip them would be very handy… An other "weird" thing I discovered is that if you have a calc in the Data viewer with the reversed question mark, close FMPA and re-open it, the calc is stripped after the first reversed question mark. So, if FM discovers these signs in the calc engine why the heck can't it strip those out?
pixi Posted September 5, 2019 Author Posted September 5, 2019 Could it be possible that stripping out all C0-Control codes from Dec 01 to Dec 31 and HexDecode ( "00" ) solves the problem? https://en.wikipedia.org/wiki/C0_and_C1_control_codes
comment Posted September 5, 2019 Posted September 5, 2019 We seem to be turning around the same point: what invisible characters are you dealing with? If it's only the null character, than yes, it is quite easy. If there can be others, then you'd need a list of them. In theory, it could be any control character or unused code point - so I don't think it's practical to construct such list in theory.
pixi Posted September 5, 2019 Author Posted September 5, 2019 Currently I only have problems with the null character. That is solved, thanks to your help! To avoid further hassle in the future/in other files I build a list using the control characters mentioned above. I'm aware that this won't solve the whole thing, but at least the most common ones. Thank you!
Recommended Posts
This topic is 1917 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now