Jump to content

0x0 - illegal xml character


Joost Miltenburg
 Share

This topic is 1761 days old. Please don't post here. Open a new topic instead.

Recommended Posts

LS,

Sometimes user copy/paste data from other systems, I guess. Anyway the data in the FM field contains a 0x0 character.

Using this character in a xml/xslt exports crashes the export.

Anyone have an idea on how to get rid of this character ? ( either fm-script or xslt will do ! )

Thans in advance !

Joost

Link to comment
Share on other sites

14 hours ago, Joost Miltenburg said:

Sometimes user copy/paste data from other systems, I guess. Anyway the data in the FM field contains a 0x0 character.

Using this character in a xml/xslt exports crashes the export.
Anyone have an idea on how to get rid of this character ? ( either fm-script or xslt will do ! )

Well this is a tricky situation, the best solution is probably to make a custom menu item and replace paste with paste unformatted. As this garbage you refer to is not writeable by regular hands, unless by mistake. This is stuff that comes from pasting formatted stuff into a FileMaker field, the most embarassing thing for FileMaker is that they are not able to output this garbage properly formatted; and hence breaks their own XSLT implementation. However I do recommend you do a manual export and try to open the manual export in Xmplify and see if you can find the underlying cause in the file.

I can tell you the manual approach( both approaches are for MacOS X ) to 1) avoiding the problem by deleting the illegal characters; when you do the export and the error requester shows up copy the path to the file with the error

time tr -d '\000-\010\013-\037\177-\237' < __push CMD+V__ > ~/export.fmpxmlresult.clean.xml
This will typically look like
time tr -d '\000-\010\013-\037\177-\237' < /tmp/user/501/FMTEMP/183761287687192638971263897612789321 > ~/export.fmpxmlresult.clean.xml
or similar, the file ~/export.fmpxmlresult.clean.xml will be a valid XML file without the control characters inside.

If you would like to reapply the stylesheet of your choice

xsltproc /path/to/yourstylesheet.xslt ~/export.fmpxmlresult.clean.xml

 

 

If you would like to 2) find the root cause and solve the problem I'd say the best approach is to copy / paste the path from the error dialog

cp __push CMD+V__ ~/export.fmpxmlresult.xml
Open the file ~/export.fmpxmlresult.clean.xml in Xmplify and go through error by error, note where you find the illegal characters and correct the data in Filemaker.

 

I'm also impressed by the admin putting my reputation to -1; kudos to you for undermining your own authority.

 

Edited by ggt667
Link to comment
Share on other sites

22 minutes ago, ggt667 said:

the best solution is probably to make a custom menu item and replace paste with paste unformatted.

This has absolutely nothing to do with the problem. A null character is a character on its own -  it is not formatting of other characters. If the copied text contains it, it will be pasted either way.

 

 

Link to comment
Share on other sites

12 minutes ago, comment said:

This has absolutely nothing to do with the problem. A null character is a character on its own -  it is not formatting of other characters. If the copied text contains it, it will be pasted either way.

Tell me, I'm all ears.

 

 

I'm sorry, but I do not have the source code of FileMaker, yet when pasting unformatted text only one 0x0 should be allowed, namely the termination of what is pasted.

I totally agree that 0x0 is a character on its own; not sure how you think or what makes the difference, 0x0 is an illegal character of XML unless escaped/entitized correctly; as &0; it's a legal character XML, while as 0x0 as raw binary it's the character that says this is the end of the XML documet.

Edited by ggt667
Link to comment
Share on other sites

2 minutes ago, ggt667 said:

when pasting unformatted text only one 0x0 should be allowed,

This too is irrelevant to the issue. The null character has no special meaning in Filemaker. It is treated the same way as any other character. The only exception is that the Char() function is incapable of producing it. For some reason, Char(0) was designed to produce an empty string instead of the expected null character. This is why you need a field to hold the null character in order to substitute it out.

  • Like 1
Link to comment
Share on other sites

Remind me what is unformatted text to you? A string where some control characters are stripped? But not all? And why is that character exported as binary and not entity? All chars listed below should be as entity or encoded in XML, what is relevant is that FileMaker exports to XML without encoding UTF-8 properly.

 

The problem is not 0x0, the problem is that FileMaker exports 0x0 as 0x0 opposed to &0; or other valid entity.

\000-\010\013-\037\177-\237

For the user exporting XML using XSLT should be a non-issue. The 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, and so on during export is FileMaker shooting itself in the foot.

Edited by ggt667
  • Like 1
Link to comment
Share on other sites

Please don't waste my time with pointless discussions. You said that the solution to the problem of users pasting text containing the null character is to paste it unformatted. I said that is incorrect - and that is all I care to say. The point was to save the OP - and anyone else reading this - the frustration of trying it and not understanding why it doesn't work. As for your other suggestions, I believe anyone can see for themselves how useful they are, without needing my advice.

Link to comment
Share on other sites

Character 0x0 is invalid as unicode regardless; as it has a meaning: end of string.

I made a note on this on my blog a couple of years back: http://wethecomputerabusersamongst.blogspot.com/2014/06/filemaker-and-those-crazy-users-pasting.html

Note it's a proof of concept and not a solution: The best solution would be for FileMaker inc to release a patch for latest FM11 and FM14 that outputs 0x0 encoded correctly when exporting data as XML.

Edited by ggt667
Link to comment
Share on other sites

13 minutes ago, ggt667 said:

The best solution would be for FileMaker inc to release a patch that outputs 0x0 encoded correctly when exporting data as XML.

Yes, that would be nice. But wishful thinking is not helping anyone to solve the problem they encounter now.

 

---
Just for the record: the null character is a valid Unicode character, and it has no meaning - except in such languages/file formats that have chosen to give it a meaning. Filemaker is not one of these.

Link to comment
Share on other sites

There is a function for removing formatting from text - TextFormatRemove ( text )

May be better to use this function instead of an undo to remove formatting, because if there is no formatting to the text, an undo will remove what they pasted.

 

Mike

Link to comment
Share on other sites

7 minutes ago, Mike Duncan said:

May be better to use this function instead of an undo to remove formatting,

1. No one suggested using Undo.

2. Neither Undo nor TextFormatRemove( ) are relevant to the issue being discussed here.

--
Please have the courtesy to read what others have posted before adding your own.

Link to comment
Share on other sites

All approaches mentioned so far demands reprocessing data already stored; in which in itself is totally crazy, as it means a lack of trust in the tool, or rather product.

And should be totally unncessary when using a product you actually paid for, these things should just work. If this was open source it would have been patched.

Edited by ggt667
Link to comment
Share on other sites

2 hours ago, comment said:

1. No one suggested using Undo.

2. Neither Undo nor TextFormatRemove( ) are relevant to the issue being discussed here.

--
Please have the courtesy to read what others have posted before adding your own.

Comment, the blog post referenced above is where this was mentioned as a fix. I accept your rebuke and sorry for the confusion.

Link to comment
Share on other sites

On 15 March, 2016 at 6:41 PM, Mike Duncan said:

Comment, the blog post referenced above is where this was mentioned as a fix. I accept your rebuke and sorry for the confusion.

There is no mention of that as anything but a POC( proof of concept ) And a way to avoid the fault of FileMaker's XML export.

 

Do anyone on here know the scope of the Filemaker SDK? Can I write my own export file type as a plugin?

Edited by ggt667
Link to comment
Share on other sites

OK !

Thanks for all the fiery insight ! I am also reporting this to FMI as a bug. 

I have found that I can do a set field[ right( value ; 1) and just get the illegal character ( if it is at the end, of course ). So a substitute is possible.

When you copy/paste the value in a program like BBEdit ( Mac ) you can actually see the character looks like a question mark upside down.
It is copy/pasteable to a FM field.

As the record is checked before locking it, I will add a script that will loop thorough the fields that are exported and substitute the value. I guess that will take a second or so to do.

So, the remaining question is : Will this character corrupt the database in any way ? Where should I store it ? In a field or a script ?

TIA

Edited by Joost Miltenburg
Found link for feedback in FM itself
Link to comment
Share on other sites

6 hours ago, Joost Miltenburg said:

When you copy/paste the value in a program like BBEdit ( Mac ) you can actually see the character looks like a question mark upside down.
It is copy/pasteable to a FM field.

If you copy a grid of data from a spreadsheet you may or may not get 0x3, 0x4, and/or 0x7

Link to comment
Share on other sites

6 hours ago, Joost Miltenburg said:

Will this character corrupt the database in any way ? Where should I store it ? In a field or a script ?

I would not paste a control character into a calculation formula. Since it cannot be produced by a function (unless you use a plugin), then a global field would be the right place to store it. Although this has the disadvantage of being lost when the file is cloned.

You might also want to take a look at:
http://www.briandunning.com/cf/1291

 

Link to comment
Share on other sites

Here is a list of the control characters: http://ascii-table.com/control-chars.php

0x9 and 0xB are the field separators for spread sheets.

How many records are in this table?

I have about 400 fields and about 1.9 mill records in one of my most used tables and I have most likely experienced more than 3/4 the characters mentioned in the link while exporting from FileMaker using XML export. Different apps produce different garbage.

On 17 March, 2016 at 10:02 AM, Joost Miltenburg said:

Will this character corrupt the database in any way ? Where should I store it ? In a field or a script ?

No, it should not corrupt the db in any way, but it's a hack to put several strings in one string, much like the FileMaker hack of making a line break in a global field, to key in several keys.

Edited by ggt667
Link to comment
Share on other sites

  • 10 months later...
On 17 March, 2016 at 0:24 PM, Joost Miltenburg said:

For you who are using the Troi-File Plugin :

TrFile_AsciiValueToText ("-Unused" ; "0x00")

This will get you the dreaded character, according to Troi !

0x00 alone as an isolated issue was never really an issue for me, FileMaker exports many characters such as 0x0X unescaped. What I do is to do an export without the stylesheet, then I clean it manually after, that however is an extra step in which a proprietary application such as FileMaker should have ironed out; and preforably escaped correctly. A retroactive patch would be appropriate for this case.

Edited by ggt667
Link to comment
Share on other sites

I'm way late to the thread here, but another way to make a (perfectly valid!) null character in FileMaker is Base64Decode ( "AA==" ). While this is a perfectly valid character both in UTF-8 and in Unicode more generally, it is disappointing that it doesn't get escaped in export formats like XML where it is not.

Edited by jbante
Link to comment
Share on other sites

  • 2 weeks later...
On 14 March, 2016 at 8:21 PM, Wim Decorte said:

can you identify / substitute it by using the Char() function?

Char ( 1 ) & Char ( 2 ) & Char ( 3 ) & Char ( 4 ) & Char ( 5 ) & Char ( 6 ) & Char ( 7 ) & Char ( 8 ) & Char ( 11 ) & Char ( 12 ) &
Char ( 14 ) & Char ( 15 ) & Char ( 16 ) & Char ( 17 ) & Char ( 18 ) & Char ( 19 ) & Char ( 20 ) & Char ( 21 ) & Char ( 22 ) & Char ( 23 ) &
Char ( 24 ) & Char ( 25 ) & Char ( 26 ) & Char ( 27 ) & Char ( 28 ) & Char ( 29 ) & Char ( 30 ) & Char ( 31 )

That will cover most of them, it's not a problem that the characters are there, it's just that when doing an export with FileMaker there will be an error as these characters are not correctly escaped when exporting to XML.

Tab, LF and CR as

Char ( 9 ) & Char ( 10 ) & Char ( 13 )

would also work fine if escaped

Screen Shot 2017-02-05 at 13.18.49.png

The application used to open the output XML is Xmplify, but tidy, xmlstartler and xsltproc all agree this is improper XML.


This is the closest I have come to a solution( in this case the numbers are kept as a debugging tool to see if it's possible to find the source ):

( Too bad it locks and returns errors when users are in that field )

Substitute(
  Substitute(
    Substitute(
      Substitute(
        Substitute(
          Substitute(
            Substitute(
              Substitute(
                Substitute(
                  Substitute(
                    Substitute(
                      Substitute(
                        Substitute(
                          Substitute(
                            Substitute(
                              Substitute(
                                Substitute(
                                  Substitute(
//                                    Substitute(
                                      Substitute(
                                        Substitute(
//                                         Substitute(
//                                            Substitute(
                                              Substitute(
                                                Substitute(
                                                  Substitute(
                                                    Substitute(
                                                      Substitute(
                                                        Substitute(
                                                          Substitute(
                                                            Substitute ( fieldname; Char ( 1 ); "&#1;" );
                                                          Char( 2 ) ; "&#2;" );
                                                        Char( 3 ) ; "&#3;" );
                                                      Char( 4 ) ; "&#4;" );
                                                    Char( 5 ) ; "&#5;" );
                                                  Char( 6 ) ; "&#6;" );
                                                Char( 7 ) ; "&#7;" );
                                              Char( 8 ) ; "&#8;" );
//                                            Char( 9 ) ; "&#9;" );
//                                          Char( 10 ) ; "&#10;" );
                                        Char( 11 ) ; "&#11;" );
                                      Char( 12 ) ; "&#12;" );
//                                    Char( 13 ) ; "&#13;" );
                                  Char( 14 ) ; "&#14;" );
                                Char( 15 ) ; "&#15;" );
                              Char( 16 ) ; "&#16;" );
                            Char( 17 ) ; "&#17;" );
                          Char( 18 ) ; "&#18;" );
                        Char( 19 ) ; "&#19;" );
                      Char( 20 ) ; "&#20;" );
                    Char( 21 ) ; "&#21;" );
                  Char( 22 ) ; "&#22;" );
                Char( 23 ) ; "&#23;" );
              Char( 24 ) ; "&#24;" );
            Char( 25 ) ; "&#25;" );
          Char( 26 ) ; "&#26;" );
        Char( 27 ) ; "&#27;" );
      Char( 28 ) ; "&#28;" );
    Char( 29 ) ; "&#29;" );
  Char( 30 ) ; "&#30;" );
Char( 31 ) ; "&#31;" )
Edited by ggt667
Link to comment
Share on other sites

Substitute ( fieldname;
[ Char( 1 ) ; "&#1;" ] ; 
[ Char( 2 ) ; "&#2;" ] ; 
[ Char( 3 ) ; "&#3;" ] ;

etc. ... )

  • Like 1
Link to comment
Share on other sites

There are a couple of issues with 0x00 it can not be inserted with Char( 0 ),

Substitute ( fieldname; Char( 0 ) ; " _There was a 0x00 here_ " )

it can not be substituted by Char( 0 ) as in the example directly above, also tried pasting the 0x00 that looks like a line break in FileMaker into the script as follows

Substitute ( fieldname; "
" ; " _There was a 0x00 here_ " )

 

when copy pasted it sometimes become Char( 13 ) – 0x00 is untouchable in many ways.

 

Screen Shot 2017-02-10 at 10.18.27.png

0x00 is an invalid search criteria when copied from a field in browse mode and pasted into the same field in find mode

 

In essence Char( 0 ) means end of string, it is fully capable on making its way into a FileMaker field, actually it probably is what terminates all FileMaker fields. It exports fine to FileMaker files.

#include <stdio.h>

int main () {

   char greeting[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
   printf("Greeting message: %s\n", greeting );
   return 0;
}

After copy / pasting using the keyboard into the same field Char( 0 ) turns into Char( 13 )

[ Solved ] When Checking the box "Apply current layout's data formatting to exported data" for formatting the issue of 0x00 is no longer present for XML exports.

Untitled.png

Edited by ggt667
  • Like 1
Link to comment
Share on other sites

A solution to the original problem! Huzzah! And I really like a more thorough enumeration of the difficulties getting FileMaker to work with the null character. However, null-terminated strings are generally avoided whenever possible these days. Despite the other awkward handling of the null character, FileMaker is probably not using null-terminated strings under the hood, either in the calculation engine or in field storage:

Let ( [
    _null = Base64Decode ( "AA==" ) ;
    _string = "a" & _null & "a"
] ;
    Position ( _string ; _null ; 1 ; 1 )    // = 2
)

 

Screen Shot 2017-02-10 at 08.23.49.png

Edited by jbante
  • Like 1
Link to comment
Share on other sites

On 2/10/2017 at 5:41 PM, comment said:

This thread seems to be stuck in a loop.

Care to elaborate?

On 2/10/2017 at 5:25 PM, jbante said:

And I really like a more thorough enumeration of the difficulties getting FileMaker to work with the null character. However, null-terminated strings are generally avoided whenever possible these days.

A string in OMG IDL is mapped to char * in C++. Both bounded and unbounded strings are mapped to char *. CORBA strings in C++ are NULL-terminated and can be used wherever a char * type is used.

omniORB is a robust high performance CORBA ORB for C++ and Python.

Edited by ggt667
Link to comment
Share on other sites

  • 5 months later...
1 hour ago, Joost Miltenburg said:

Using the check box for formatting does not fix the problem.

Tell us something we don't know...

 

1 hour ago, Joost Miltenburg said:

I add the character and that's another one down.

I am not sure what you mean by that.

 

Link to comment
Share on other sites

This topic is 1761 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.