El_Pablo Posted January 6, 2009 Posted January 6, 2009 Hi, Is there anyone who know a good freeware with which I could convert XHTML tags and content to a XML file? Thank
comment Posted January 6, 2009 Posted January 6, 2009 Not sure what you are asking. A valid XHTML is already XML.
El_Pablo Posted January 6, 2009 Author Posted January 6, 2009 I know, but I want to extract some data from an XHTML source so I can import them in Filemaker.
comment Posted January 6, 2009 Posted January 6, 2009 Do you know how to import XML into Filemaker using a XSLT stylesheet?
El_Pablo Posted January 6, 2009 Author Posted January 6, 2009 Can we do that?? If so, do you have a good tutorial?
comment Posted January 6, 2009 Posted January 6, 2009 (edited) Yes, we can and no, I don't. There are a few examples in the Extras folder next to your application, and more on FMI's site. And of course there are a number of threads here, either in this or the Export section. Edited January 6, 2009 by Guest
El_Pablo Posted January 6, 2009 Author Posted January 6, 2009 I attached a source file example. I want to extract the content of the rows where the tr tag and its attribute class="txt_general". I just want to know where to start. In this example, the primary key is 5176748. You can locate the right tag with the ID. srcData.zip
comment Posted January 7, 2009 Posted January 7, 2009 You haven't answered my question. Do you have any experience with XSLT? I am asking this because I too need to know where to start... :P
comment Posted January 7, 2009 Posted January 7, 2009 I don't really know what you expect. I took a look at your document: although I was able to extract data from it using an external XSLT processor, Filemaker refuses to import it directly for some reason. I suspect it may because the encoding is declared incorrectly.
El_Pablo Posted January 7, 2009 Author Posted January 7, 2009 What is the external XSLT processor you're using?
comment Posted January 7, 2009 Posted January 7, 2009 It's an OS X application called TestXSLT, and it has a choice of 4 processors: Sablotron, libxslt, Saxon and Xalan-J. Mind you, it doesn't do anything of itself - I had to write a stylesheet for it to run against your file.
comment Posted January 7, 2009 Posted January 7, 2009 Can't hurt - but I believe you're going to have problems with your source. It's not strict XHTML (doesn't even have an XML declaration), and as I said, I suspect it has other issues as well. If you need to to do this often, you should look for a better alternative.
El_Pablo Posted January 7, 2009 Author Posted January 7, 2009 I had to modify the source because two tags weren't standard. Maybe I'll do something in C# with regular expression to extract the data.
comment Posted January 7, 2009 Posted January 7, 2009 What exactly are you trying to accomplish here? Is this a process you need to repeat regularly?
El_Pablo Posted January 7, 2009 Author Posted January 7, 2009 Yes, I need to extract some data from a search result. I will need to do this in a regular basis. There might as much as 200 rows to extract from the page. I thought it would be easier using an XML+XSLT parser.
comment Posted January 7, 2009 Posted January 7, 2009 If you have any control over the server side, that would be the best place to make changes. Otherwise, you might be able to script the extraction in a web viewer, I think (you never said what exactly you need to extract, and I think there's only one record in your sample). If external pre-processing is an option, I believe you could run the file against a XSLT stylesheet from the command line, or use a third-party app. (don't know what's available for Windows)
El_Pablo Posted January 7, 2009 Author Posted January 7, 2009 I attached a file with multiple rows. I want extract the data in the rows. Eg: Address, city, price, ... This is an original source, there are two errors that make the file not XML compliant. btw It's in french. srcExample.txt
comment Posted January 7, 2009 Posted January 7, 2009 Please zip your file before attaching, so we can eliminate a possible cause of the encoding mismatch. The file says "charset=iso-8859-1", but it is really UTF-8.
El_Pablo Posted January 7, 2009 Author Posted January 7, 2009 I know, but I have no control on the server side. Okay, I'll zip the file next time.
Recommended Posts
This topic is 5797 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now