Jump to content

Zap gremlins in text.


This topic is 1689 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Does anyone have an idea how to strip everything invisible and just get the visible text in a field back? I tried various functions and still don't get why it's not possible to zap those gremlins natively or at all. And, NO! A CF containing every char to be removed is no solution. At least until EVERY SINGLE invisible char is included. Is there a list of those somewhere?

@comment has posted a CF and although this solves the problem as such, it also strips out other chars.

Example.png

Link to comment
Share on other sites

Meanwhile I found this function: GetAsURLDecoded https://www.briandunning.com/cf/2095 
When I run the GetAsURLEncoded and then the GetAsURLDecoded it gives the cleared text back. I just have to test it further for other chars.

What is your opinion on that?

example2.png

Link to comment
Share on other sites

The invisible ones… :o)
I don't know much enough about chars to answer your question properly. I just want the - visible - text in the fields to be stored/used and not all the other - invisible - ones too.

Amongst others this throws me errors when using this in e.g. Execute SQL to INSERT INTO a SQL-table.

How do you define "allowed" and "prohibited"?

Link to comment
Share on other sites

3 minutes ago, pixi said:

How do you define "allowed" and "prohibited"?

You define what's allowed by your usage. If you're using English, then you only need the lower ASCII set. If you're using XML, then you allow characters in the range specified by the XML standard. And so on. 

I am not sure where your invisible characters are coming from and what exactly they are. You could analyze them using the Code() function. They probably belong to a small group of control characters.  

As I mentioned earlier, a truly generic solution is possible only through regex. Unicode has a category of invisible control characters and unused code points, and regex should be able to use it (subject to testing with your specific implementation).

 

Link to comment
Share on other sites

23 hours ago, pixi said:

When I run the GetAsURLEncoded and then the GetAsURLDecoded it gives the cleared text back.

I made a small test of my own, starting with a text that contains a vertical tab character, Char (11). The GetAsURLEncoded() function encodes the character as " %0B". And HexDecode ( "0B" ) returns the original character. So again, I am curious exactly which characters you're dealing with.

 

 

Edited by comment
Link to comment
Share on other sites

25 minutes ago, comment said:

 So again, I am curious exactly which characters you're dealing with.

It started with the reversed question mark (the red one above). I don't exactly know where the hidden chars come from. The data is entered by the customers staff, either via copy & paste or typed.

In this very case it is just First and Last names and Email addresses. The names are not English only, so lower ASCII doesn't work.

Using "Code" gives me 0 (see screenshot below). I copied the char from a TextWrangler doc and pasted it 5 times between exclamation marks.

 

 

example3.png

Link to comment
Share on other sites

The reversed question mark is just a placeholder for an invisible character. It could be any character. The Code() function returning 0 is a much better indication. The null character is a well-known pest in text pasted from external sources. Unfortunately, getting rid of it is not entirely trivial, because in Filemaker Char ( 0 ) produces an empty string. Possible workarounds:

  1. paste the character into a global field;
  2. generate the character by: HexDecode ( "00" );
  3. generate the character by: ExecuteSQL ( "SELECT DISTINCT CHR(0) FROM SomeTable" ; "" ; "" );
  4. generate the character by: Base64Decode ( "AA==" ).

Once you have the character, it's easy to substitute it out.

--
P.S. I think I have discovered why your attempt to use GetAsURLEncoded() seemed to work: it's because the null characters are at the end of the string. If you place them in the middle, you will lose the text that follows them.

 

Edited by comment
  • Like 1
Link to comment
Share on other sites

So, if I get you right I just have to Substitute ( txt; HexDecode ( "00" ); "" ) ?
That seems to be quite easy…a bit too easy maybe?
The global fields retains the char, no matter if it is opened locally or from a server.

Is there a way to discover/a list of all "invisible" chars? A CF to strip them would be very handy…

An other "weird" thing I discovered is that if you have a calc in the Data viewer with the reversed question mark, close FMPA and re-open it, the calc is stripped after the first reversed question mark.
So, if FM discovers these signs in the calc engine why the heck can't it strip those out?

Link to comment
Share on other sites

We seem to be turning around the same point: what invisible characters are you dealing with? If it's only the null character, than yes, it is quite easy. If there can be others, then you'd need a list of them. In theory, it could be any control character or unused code point - so I don't think it's practical to construct such list in theory.

Link to comment
Share on other sites

Currently I only have problems with the null character. That is solved, thanks to your help!

To avoid further hassle in the future/in other files I build a list using the control characters mentioned above. I'm aware that this won't solve the whole thing, but at least the most common ones.

Thank you!

Link to comment
Share on other sites

This topic is 1689 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.