Jump to content
Server Maintenance This Week. ×

Remove Unwanted Characters


JerrySalem

This topic is 6615 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hey

I have a system that is exporting data to be used by another system. We have decided to use XML.

Some of the fields that are exported contain illegal characters (from user input). Specifically control k, control t and control y.

These key combinations are illegal and the resulting XML can't be imported into FMP (or other systems). Really, try it.

So I added script steps on the fields that my users can free text so that these characters will be substituted with blanks.

Now my question, after I did all this painfull scripting, I realized I might be able to do the same thing by exporting with a style sheet.

Any of you guru's out there have any experience in writing a style sheet that would simply export the data as is, but remove all instances of the above 3 characters.

Any help would be appreciated!

Jerry

Link to comment
Share on other sites

I can't help you, Jerry, but I must say this is amazing. Font Locking doesn't stop it either. The only way I can stop it is in my solution in which we use SecureFM to REMOVE the menu bar entirely. CTRL-W also will close the window (file?) but it doesn't with menu bar removed. I will remember this oddity. Thank you! :wink2:

One thing that works but would be tedious: Auto-Enter (Replace) on all text fields with:

Substitute(text ; "" ; "" )

And now I'm going on a hunt through my data!

LaRetta

Edited by Guest
Link to comment
Share on other sites

LaRetta,

using the substitue command is exactly how I solved my problem. However, I consider this a kludge.

My premise is that it would be better solved by using a single style sheet to remove these characters from the export as it is created. (not on a field by field basis as with my script).

But I can't figure out how to write the style sheet to do this!

Jerry

Link to comment
Share on other sites

The XSLT stylesheet will not work, because the XML parser (i.e. Xerces) that acts before the transformation will complain that the control characters are not XML compatible.

The only solution is to remove them in the database.

If the database is complex, finding and replacing the characters in every field might be a tedious task. However, what you can do is the following:

Do an XML export (this works).

With the help of an external XSLT processor (e.g. Xalan or Saxon) and the following stylesheet try to transform the exported XML:

<?xml version="1.0" encoding="UTF-8"?>

The XSLT processor will stop at a wrong character and throw an error message and a line number where the error occured. With that you can try to find the error position in the XML file and finally in the database.

This you have to repeat until all wrong characters are removed.

Link to comment
Share on other sites

It strikes me as strange that Perform Find/Replace will not find the box character - but Substitute() will (tested with Replace Field Contents); unless I'm doing something wrong. Well, dates, time and timestamps are no problem - it appears to only plant in number and text fields. But still ... it bothers me.

Link to comment
Share on other sites

The XSLT stylesheet will not work, because the XML parser (i.e. Xerces) that acts before the transformation will complain that the control characters are not XML compatible.

I must repectfully dissagree with this statment. I am able to export lots of data with illegal characters. It isn't untill I try to import them into some other program that I get these error messages.

But I will play around with the style sheeet you posted.

TIA

Jerry

Link to comment
Share on other sites

As said, exporting works as long as you don't assign a stylesheet. Otherwise the XML will be parsed and transformed by Xerces and Xalan, and the you get in trouble.

When you import, then of course the XML must be parsed to generate the records, that's when the error occurs.

That's why I suggested to use an external XSLT processor to find the wrong characters.

Edited by Guest
Link to comment
Share on other sites

Let me try to explain in other words:

For exporting XML, FMI probably uses FM Pro-internal subroutines that analyze the fields and field types that you have specified, and generates valid and well-formed XML for those fields that contain text. However, these subroutines do not analyze the content of the fields, but just copy it between the XML tags, that's why also the "wrong" characters are exported.

If you specify an XSLT stylesheet for additional transformation during the export, the generated XML will be handed over to the Xerces XML parser and some XSLT processor that requires a parsed XML structure which will be some form of Java or C++ object. There transformation will stop at the first non-XML compliant character.

If not, then you just get the XML generated by the internal subroutines.

The situation is different for import: There the XML has first to be parsed and transferred to some object-type structure, probably in C++. This will be done again with Xerces which will fail at the first wrong character it encounters.

Edited by Guest
Link to comment
Share on other sites

This topic is 6615 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.