JerrySalem Posted March 16, 2006 Posted March 16, 2006 Hey I have a system that is exporting data to be used by another system. We have decided to use XML. Some of the fields that are exported contain illegal characters (from user input). Specifically control k, control t and control y. These key combinations are illegal and the resulting XML can't be imported into FMP (or other systems). Really, try it. So I added script steps on the fields that my users can free text so that these characters will be substituted with blanks. Now my question, after I did all this painfull scripting, I realized I might be able to do the same thing by exporting with a style sheet. Any of you guru's out there have any experience in writing a style sheet that would simply export the data as is, but remove all instances of the above 3 characters. Any help would be appreciated! Jerry
LaRetta Posted March 16, 2006 Posted March 16, 2006 (edited) I can't help you, Jerry, but I must say this is amazing. Font Locking doesn't stop it either. The only way I can stop it is in my solution in which we use SecureFM to REMOVE the menu bar entirely. CTRL-W also will close the window (file?) but it doesn't with menu bar removed. I will remember this oddity. Thank you! One thing that works but would be tedious: Auto-Enter (Replace) on all text fields with: Substitute(text ; "" ; "" ) And now I'm going on a hunt through my data! LaRetta Edited March 16, 2006 by Guest
JerrySalem Posted March 16, 2006 Author Posted March 16, 2006 LaRetta, using the substitue command is exactly how I solved my problem. However, I consider this a kludge. My premise is that it would be better solved by using a single style sheet to remove these characters from the export as it is created. (not on a field by field basis as with my script). But I can't figure out how to write the style sheet to do this! Jerry
Martin Brändle Posted March 16, 2006 Posted March 16, 2006 The XSLT stylesheet will not work, because the XML parser (i.e. Xerces) that acts before the transformation will complain that the control characters are not XML compatible. The only solution is to remove them in the database. If the database is complex, finding and replacing the characters in every field might be a tedious task. However, what you can do is the following: Do an XML export (this works). With the help of an external XSLT processor (e.g. Xalan or Saxon) and the following stylesheet try to transform the exported XML: <?xml version="1.0" encoding="UTF-8"?> The XSLT processor will stop at a wrong character and throw an error message and a line number where the error occured. With that you can try to find the error position in the XML file and finally in the database. This you have to repeat until all wrong characters are removed.
LaRetta Posted March 16, 2006 Posted March 16, 2006 It strikes me as strange that Perform Find/Replace will not find the box character - but Substitute() will (tested with Replace Field Contents); unless I'm doing something wrong. Well, dates, time and timestamps are no problem - it appears to only plant in number and text fields. But still ... it bothers me.
JerrySalem Posted March 16, 2006 Author Posted March 16, 2006 The XSLT stylesheet will not work, because the XML parser (i.e. Xerces) that acts before the transformation will complain that the control characters are not XML compatible. I must repectfully dissagree with this statment. I am able to export lots of data with illegal characters. It isn't untill I try to import them into some other program that I get these error messages. But I will play around with the style sheeet you posted. TIA Jerry
Martin Brändle Posted March 16, 2006 Posted March 16, 2006 (edited) As said, exporting works as long as you don't assign a stylesheet. Otherwise the XML will be parsed and transformed by Xerces and Xalan, and the you get in trouble. When you import, then of course the XML must be parsed to generate the records, that's when the error occurs. That's why I suggested to use an external XSLT processor to find the wrong characters. Edited March 16, 2006 by Guest
Martin Brändle Posted March 17, 2006 Posted March 17, 2006 (edited) Let me try to explain in other words: For exporting XML, FMI probably uses FM Pro-internal subroutines that analyze the fields and field types that you have specified, and generates valid and well-formed XML for those fields that contain text. However, these subroutines do not analyze the content of the fields, but just copy it between the XML tags, that's why also the "wrong" characters are exported. If you specify an XSLT stylesheet for additional transformation during the export, the generated XML will be handed over to the Xerces XML parser and some XSLT processor that requires a parsed XML structure which will be some form of Java or C++ object. There transformation will stop at the first non-XML compliant character. If not, then you just get the XML generated by the internal subroutines. The situation is different for import: There the XML has first to be parsed and transferred to some object-type structure, probably in C++. This will be done again with Xerces which will fail at the first wrong character it encounters. Edited March 17, 2006 by Guest
Recommended Posts
This topic is 6883 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now