Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×
The Claris Museum: The Vault of FileMaker Antiquities at Claris Engage 2025! ×

Strip second line from XML prior to processing


Recommended Posts

Posted (edited)

Hi,

Using Insert from URL, the result can be:
1. Downloaded as text into a text field (in XML format). 

2. Downloaded as a text file into a container field, also in XML format, with a generic file name (efetch.fcgi).

The tricky part is that the second line contains the DOCTYPE. When FileMaker reads this line, it spends up to twenty seconds there, and then progresses with the script. As there are many XSLTs, FileMaker reads the DOCTYPE that many times, extending the entire process to way over 3 minutes or so.

If I manually remove the second line, the entire processing time goes down to the expected 1-2 seconds in total.

In 1., I've commented out the second line with a calculation, but I haven't been able to get the script to recognize the calculated text as XML and use that modified XML to process with the XSLTs.

In 2., I don't really know if I can modify the contents of efetch.fcgi from within its container field.

What would be the best way to do this? Would both cases requiere the modified XML to be downloaded and reuploaded?

Best regards,

Daniel

Screenshot 2024-05-09 at 12.39.14 PM.png

Edited by Quito
Added screenshot; added clarification
Posted (edited)
1 hour ago, Quito said:

When FileMaker reads this line, it spends up to twenty seconds there, and then progresses with the script.

Do you mean during the import? After you have exported/written the field's contents to somewhere on your hard disk?
 

---
Added:

Haven't we done this before?
https://fmforums.com/topic/110220-dialog-window-xmlxsl-information-is-not-enough-to-proceed-with-importexport/?do=findComment&comment=492213

 

 

 

Edited by comment
Posted (edited)

Hi, Comment,

Thank you. Yes, during the import. The XML processing works if the XML file is manually sent to a container field. This triggers the XML/XSLT processing script correctly, the XML is sent to the Desktop and reimported using the XSLTs.

My problem occurs if the XML is stored as a calculated text in a field OR as an XML within its container. Then the script fails. Can't the XML processing + XSLT occur directly against the stored files, without the need of the export-reimport step?

-----

Yes, we have discussed this before, and I have updated the scripts accordingly, taking into account your insight as much as possible. Thank you again!

My position is that both the XML and the XSLT are stored in their corresponding container fields, and that exporting the XML just to reimport it seems unnecessary. It does work, yet stripping the second line would make it perform faster. Also, there can be thousands of separate XMLs in the processing queue (one of my tests involve a batch with 4800 separate XML records; another test involves a single XML with tens of thousands of records than can be over 2GB in size). If every time an XML needs to be processed the Desktop gets a copy, then pretty soon the Desktop will get madly cluttered with XML files that then need to be deleted. As I do not know the path of the Desktop user, I don't think I can script delete the XML files after processing occurs. It also seems pretty dangerous to me to have scripts running against the Desktop, unless the user allows it. Thus the idea of handling eveything from within the tool.

 

Now, I'm thinking that perhaps storing the XML from the text field into a temporary variable might do the trick. Or just importing the record directly from the XML server using an HTTP request, and skipping the Insert from URL step altogether. Yet, the problem regarding the second line will persist, and the import of a large file will take months.

All the very best,

Daniel

Edited by Quito
Clarification
Posted (edited)
34 minutes ago, Quito said:

exporting the XML just to reimport it seems unnecessary.

Let me reiterate something I wrote in the other thread:

You cannot import a file from a container field. The file must reside on your hard disk (unless you're importing it directly from an URL).

I suspect you are confusing yourself by having the file inserted into a container field as reference only. In such case the file exists only on your hard disk and the container field stores only the path to it.

If you want to strip the DOCTYPE declaration from the XML before you import it, then your process should follow these steps:

  1. Insert the file using the Insert from URL[] script step into a variable;
  2. Remove the DOCTYPE declaration;
  3. Write the result to a file in the temporary folder;
  4. Import the file.

There are several options how to perform steps #2 and #3 which I won't go into now.

 

34 minutes ago, Quito said:

If every time an XML needs to be processed the Desktop gets a copy, then pretty soon the Desktop will get madly cluttered with XML files that then need to be deleted. As I do not know the path of the Desktop user, I don't think I can manually delete the XML files after processing occurs.

This is solved easily by using the temporary folder instead. Note that If you wanted, you could just overwrite the same file every time. But it is not necessary.

 

Edited by comment
Posted

OK, so POE.ai provided the following script, based on your reply:
 

# Define script variables

Set Variable [ $url ; "https://example.com/file.txt"
Set Variable [ $tempFolder ; Get ( TemporaryPath ) ]
Set Variable [ $tempFilePath ; $tempFolder & "temp.txt" ]

# Insert file from URL into a variable

Insert from URL [ Select ; $url ; $tempFilePath ]

# Remove second line from the text

Set Variable [ $text ; Substitute ( $text ; ¶ & GetValue ( $text ; 2 ) & ¶ ; ¶ ) ]

# Write modified text to a temporary file

Set Variable [ $fileHandle ; Open for Write ( $tempFilePath ) ]
If [ $fileHandle ≠ "" ]
Set Variable [ $writeResult ; Write to File ( $fileHandle ; $text ) ]
Close File [ $fileHandle ]
End If

# Import the temporary file

Import Records [ With dialog: Off ; "$tempFilePath" ]

-------------

I don't expect it to work as is but, do you notice something overtly wrong for any step in particular?

  • 3 weeks later...
Posted

OK, so it's taken me 20 days to progress to a promising, yet non-working script:

image.thumb.png.481aa00b3a580ed404a56cc0c0ac48fe.png

Inside the first line is:

Choose ( Abs ( Get ( SystemPlatform ) ) -1 ;      /*MAC OS X*/      Get ( TemporaryPath ) & "Pubmed.xml" ;      /*WINDOWS*/      "filewin:"& Get ( TemporaryPath ) & "Pubmed.xml" )

cXML_source contains a Substitute that removes the second line from the XML (the DOCTYPE with the DTD).

Import fails with a [719] Error in transforming XML using XSL (from Xalan).

Posted
50 minutes ago, Quito said:

cXML_source contains a Substitute that removes the second line from the XML (the DOCTYPE with the DTD).

We don't see this part, so we don't know if it does what you claim or causes a problem. Nor do we see the XSLT. 
 

I would suspect that one or both files contains an XML declaration like:

<?xml version="1.0" encoding="UTF-8"?>

but since you are doing Export Field Contents for both, these files end up being encoded as UTF-16. But that's just a guess.


 

Posted (edited)

Hi, Comment,

The contents of the cXML_source are:

Substitute ( XML_source ; [ "<!DOCTYPE PubmedArticleSet PUBLIC \"-//NLM//DTD PubMedArticle, 1st January 2024//EN\" \"https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_240101.dtd\">" ; "<!-- <!DOCTYPE PubmedArticleSet PUBLIC \"-//NLM//DTD PubMedArticle, 1st January 2024//EN\" \"https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_240101.dtd\"> -->" ] )

If I import the XML using the XSLT from the File/Import Records/XML Data Source, it performs flawlessly. So, it has to be something wrong with the script.

The XML contains the following declaration:

<?xml version="1.0" ?>

Your assumption is correct. The XSLT states the "utf-8" encoding, twice. AFAIK, "utf-16" is necessary for asian languages, but otherwise, I don't understand the implications, nor how to correct the script. Maybe I have to force the use of "utf-8" during the download?

Edited by Quito
Posted (edited)

Solved it: 

In the GetContainerAtribute, I was writing a specific name for "filename" in some portions of the script. I noticed it when the script went through with "filename", yet failed with the specific name.

Thanks, Comment. After at least 7 years, the PubMed en español project is finally ready for use in MacOS. Will be testing it shortly in Windows and on the server.

Best regards,

Daniel

Edited by Quito
Posted

I think you could make this significantly simpler by using variables instead of fields.

 

35 minutes ago, Quito said:

Will be testing it shortly in Windows and on the server.

It won't work in a server-side script because you are using Export Field Contents. You should be writing to a data file instead. This was also already mentioned in the previous thread.

 

Posted

I think you could make this significantly simpler by using variables instead of fields.

Please elaborate further.

It won't work in a server-side script because you are using Export Field Contents. You should be writing to a data file instead. This was also already mentioned in the previous thread.

So, I tested it on Windows, made a few adjustments and finally it's working on both MacOS and Windows 11.

Will work on the Write to Data File script now, and I'll open another topic, if necessary.

Although the software has always been intended to be used on a server, I had to see it working locally first.

Does the Write to Data File function for both local and server use?

All the very best,

Daniel

Posted
3 hours ago, Quito said:

Please elaborate further.

I thought I already did in my 4 points above. To expand further, it would probably look something like this (pseudocode, untested):

# Download and pre-process the XML
Insert from URL [ $XML; "https://your.source.com/xml" ]
Set Variable [ $XML; RightValues ( $XML ; ValueCount ( $XML ) - 2 ) ] 

# Write to file
Set Variable [ $filePath_XML; Get (TemporaryPath) & "source.xml" ]
Create Data File [ $filePath_XML ]
Open Data File [ $filePath_XML ; Target: $dataFile_XML ]
Write to Data File [ File ID: $dataFile_XML ; Data source: $XML ] 
Close Data File [ File ID: $dataFile_XML ]

for the XML part.

For the XSLT, I would use the Insert Text[] step to store the XSLT in the script itself as $XSLT variable. Then write it to file using the same method as the XML:

Set Variable [ $filePath_XSLT; Get (TemporaryPath) & "stylesheet.xml" ]
Create Data File [ $filePath_XSLT ]
Open Data File [ $filePath_XSLT ; Target: $dataFile_XSLT ]
Write to Data File [ File ID: $dataFile_XSLT ; Data source: $XSLT ] 
Close Data File [ File ID: $dataFile_XSLT ]

Now you can do:

Import Records [ $filePath_XML; $filePath_XSLT ] 

 

3 hours ago, Quito said:

Does the Write to Data File function for both local and server use?

Yes.

FYI, every script step help page shows a compatibility table like this:

image.png.5736d0f3d66d2c7ff6ab7292ccfc07ca.png

In addition, you can see which script step are server-compatible by selecting "Server" from the compatibility menu in the top right corner of the Script Workspace.

 

 

 

 

  • 3 weeks later...
Posted

Hi, @comment,

Write to Data File has replaced what was scripted previously. I am getting a 300 error (because the file is open?). I've checked around the Forums but cannot find a way to fix it. Is something missing in the script? I'm adding a screenshot.

All the very best,

Daniel

Screenshot 2024-06-14 at 8.55.19 PM.png

Posted

I don't know, because I am not able to reproduce the problem.

See what happens if you pause the script for a second before trying to open the data file.

And you certainly must close the data file after writing to it, before you attempt to use it in the import.

 

  • 4 weeks later...
Posted

Hi, Comment,

Following your advice, after making the script a bit more legible, and adding Close Data File, the script progressed successfully:

-----------

Insert from URL [ Select; With dialog: Off; Target: $Pubmedxml ; "https://eutils.ncbi.nlm.nih.gov/...
Set Variable [ $Pubmedxml; Value: RightValues ( $Pubmedxml ; ValueCount ( $Pubmedxml ) -2 ) ]
Set Variable [ $filePath_XML; Value: Get (DesktopPath) & "pubmed.xml" ]
Get File Exists [ "$filePath_XML" ; Target: $fileExists ]
If [ not $fileExists ]
Create Data File [ "$filePath_XML" ; Create folders: Off ]
End If
Open Data File [ "$filePath_XML" ; Target: $dataFile_XML ]
Show Custom Dialog [ "File ID" ; "File ID for " & $filePath_XML & ": " & $dataFile_XML ]
Write to Data File [ File ID: $dataFile_XML ; Data source: $Pubmedxml; Write as: UTF-8 ]
Close Data File [ File ID: $dataFile_XML ]

---------

In time, the DesktopPath will change to TemporaryPath.

Now I'm getting a 719 error (Error in transforming XML using XSL) when parsing the second stylesheet, but that's for another topic.

Thank you sooo much and,

All the very best,

Daniel

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.