Remove Unwanted Characters

JerrySalem · March 16, 2006

Hey

I have a system that is exporting data to be used by another system. We have decided to use XML.

Some of the fields that are exported contain illegal characters (from user input). Specifically control k, control t and control y.

These key combinations are illegal and the resulting XML can't be imported into FMP (or other systems). Really, try it.

So I added script steps on the fields that my users can free text so that these characters will be substituted with blanks.

Now my question, after I did all this painfull scripting, I realized I might be able to do the same thing by exporting with a style sheet.

Any of you guru's out there have any experience in writing a style sheet that would simply export the data as is, but remove all instances of the above 3 characters.

Any help would be appreciated!

Jerry

LaRetta · March 16, 2006

I can't help you, Jerry, but I must say this is amazing. Font Locking doesn't stop it either. The only way I can stop it is in my solution in which we use SecureFM to REMOVE the menu bar entirely. CTRL-W also will close the window (file?) but it doesn't with menu bar removed. I will remember this oddity. Thank you! :wink2:

One thing that works but would be tedious: Auto-Enter (Replace) on all text fields with:

Substitute(text ; "" ; "" )

And now I'm going on a hunt through my data!

LaRetta

Edited March 16, 2006 by Guest

JerrySalem · March 16, 2006

LaRetta,

using the substitue command is exactly how I solved my problem. However, I consider this a kludge.

My premise is that it would be better solved by using a single style sheet to remove these characters from the export as it is created. (not on a field by field basis as with my script).

But I can't figure out how to write the style sheet to do this!

Jerry

Martin Brändle · March 16, 2006

The XSLT stylesheet will not work, because the XML parser (i.e. Xerces) that acts before the transformation will complain that the control characters are not XML compatible.

The only solution is to remove them in the database.

If the database is complex, finding and replacing the characters in every field might be a tedious task. However, what you can do is the following:

Do an XML export (this works).

With the help of an external XSLT processor (e.g. Xalan or Saxon) and the following stylesheet try to transform the exported XML:

<?xml version="1.0" encoding="UTF-8"?>

The XSLT processor will stop at a wrong character and throw an error message and a line number where the error occured. With that you can try to find the error position in the XML file and finally in the database.

This you have to repeat until all wrong characters are removed.

LaRetta · March 16, 2006

It strikes me as strange that Perform Find/Replace will not find the box character - but Substitute() will (tested with Replace Field Contents); unless I'm doing something wrong. Well, dates, time and timestamps are no problem - it appears to only plant in number and text fields. But still ... it bothers me.

JerrySalem · March 16, 2006

The XSLT stylesheet will not work, because the XML parser (i.e. Xerces) that acts before the transformation will complain that the control characters are not XML compatible.

I must repectfully dissagree with this statment. I am able to export lots of data with illegal characters. It isn't untill I try to import them into some other program that I get these error messages.

But I will play around with the style sheeet you posted.

TIA

Jerry

Martin Brändle · March 16, 2006

As said, exporting works as long as you don't assign a stylesheet. Otherwise the XML will be parsed and transformed by Xerces and Xalan, and the you get in trouble.

When you import, then of course the XML must be parsed to generate the records, that's when the error occurs.

That's why I suggested to use an external XSLT processor to find the wrong characters.

Edited March 16, 2006 by Guest

Martin Brändle · March 17, 2006

Let me try to explain in other words:

For exporting XML, FMI probably uses FM Pro-internal subroutines that analyze the fields and field types that you have specified, and generates valid and well-formed XML for those fields that contain text. However, these subroutines do not analyze the content of the fields, but just copy it between the XML tags, that's why also the "wrong" characters are exported.

If you specify an XSLT stylesheet for additional transformation during the export, the generated XML will be handed over to the Xerces XML parser and some XSLT processor that requires a parsed XML structure which will be some form of Java or C++ object. There transformation will stop at the first non-XML compliant character.

If not, then you just get the XML generated by the internal subroutines.

The situation is different for import: There the XML has first to be parsed and transferred to some object-type structure, probably in C++. This will be done again with Xerces which will fail at the first wrong character it encounters.

Edited March 17, 2006 by Guest

Sign In

Remove Unwanted Characters

Recommended Posts

JerrySalem

LaRetta

JerrySalem

Martin Brändle

LaRetta

JerrySalem

Martin Brändle

Martin Brändle

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information