Jump to content
Quito

xsl to import multiple attributes within a Pubmed subset

Recommended Posts

Hi FMForums Community,

After exporting PubMed records in XML format (where the default file name is pubmed_result.xml), the first two lines within the file must be removed in order to import pubmed_result.xml into FileMaker (14 Adv, MacOS Sierra) using a custom .xsl file.

Within the ArticleIdList subset there is the ArticleId, with multiple IdType. I am including the relevant XML and XSL portions.

----------------

XML

<PubmedArticleSet>

  <PubmedArticle>

    <PubmedData>

        <ArticleIdList>
            <ArticleId IdType="pubmed">27167443</ArticleId>
            <ArticleId IdType="pii">S1806-37132016000200162</ArticleId>
            <ArticleId IdType="doi">10.1590/S1806-37562016000000114</ArticleId>
            <ArticleId IdType="pmc">PMC4853075</ArticleId>
        </ArticleIdList>

    </PubmedData>
  </PubmedArticle>

</PubmedArticleSet>

------------------

XSL

                        <COL>
                            <DATA>
                        <xsl:for-each select="PubmedData/ArticleIdList">
                            <ArticleId>
                               <xsl:for-each select="*">
                                  <xsl:value-of select="." />
                                  <xsl:if test="position()!=last()">
                                  <xsl:value-of select="'&#xD;'"/>
                                  </xsl:if>
                               </xsl:for-each>
                            </ArticleId>
                        </xsl:for-each>
                            </DATA>
                        </COL>
-------------------

The result I'm getting after importing is:

27167443
S1806-37132016000200162
10.1590/S1806-37562016000000114
PMC4853075

But I would like to include the ArticleId IdType attribute to make it look like this:

pubmed: 27167443
pii: S1806-37132016000200162
doi: 10.1590/S1806-37562016000000114
pmc: PMC4853075

In another topic within fmforums.com there was a suggestion to use @ to capture multiple attributes within a subset.  Something like:

<xsl:value-of select="ArticleId/@IdType" />

What should the xsl contain in order to make pubmed: pii: doi: and pmc: visible?

Many thanks in advance.

Kind regards,

Daniel

Edited by Quito
Clarification

Share this post


Link to post
Share on other sites

I don't think you are approaching this correctly.

If you know in advance which ID types are included (or can be included) in an article, then create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType - not by using xsl:for-each. If you don't know all the possible ID types, then you need to import them as individual records, not fields, into a related table, where each record would have fields for type, value, and some common value tying all IDs of the same article together.

The way you are going about this has two major flaws: first, you will end up with poorly structured data; a field should not be holding both the type and the value together. Also, the number of fields a stylesheet can import is fixed in advance. The fields are defined in the stylesheet's METADATA section - which is sadly missing from your post - and in Filemaker's field import order. You cannot add fields on the fly - that's why it makes very little sense to try and grab all the IDs using xsl:for-each.

 

--
P.S. I did not understand this part:

Quote

the first two lines within the file must be removed in order to import pubmed_result.xml into FileMaker

If the source file is a well-formed XML document, you should be able to import it as is - without having to modify it in any way.

 

Edited by comment

Share this post


Link to post
Share on other sites
9 hours ago, comment said:

I don't think you are approaching this correctly.

If you know in advance which ID types are included (or can be included) in an article, then create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType - not by using xsl:for-each. If you don't know all the possible ID types, then you need to import them as individual records, not fields, into a related table, where each record would have fields for type, value, and some common value tying all IDs of the same article together.

The way you are going about this has two major flaws: first, you will end up with poorly structured data; a field should not be holding both the type and the value together. Also, the number of fields a stylesheet can import is fixed in advance. The fields are defined in the METADATA section, which is sadly missing from your post. You cannot add fields on the fly - that's why it makes very little sense to try and grab all the IDs using xsl:for-each.

 

--
P.S. I did not understand this part:

If the source file is a well-formed XML document, you should be able to import it as is - without having to modify it in any way.

 

Hi,

Thanks for replying promptly.  Answers to your comments follow below:

"The fields are defined in the METADATA section, which is sadly missing from your post."

This is the Metadata:

            <METADATA>
                <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="ArticleID" TYPE="TEXT"/>
            </METADATA>

"If you know in advance which ID types are included (or can be included) in an article..."

A record can include a minimum of 1 (*) and a maximum of 4 IDTypes.

The possible IDTypes are:

pubmed *
pii
doi
pmc
"...create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType"

That's exactly what I'm wondering how to accomplish.

"a field should not be holding both the type and the value together"

Agreed.  It would make sense to have the IDType and it's content populate different fields.  How do I achieve this?

"P.S. I did not understand this part:

If the source file is a well-formed XML document, you should be able to import it as is - without having to modify it in any way."

These are the first two lines from the xml file:

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2016//EN" "https://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_160101.dtd">

These lines are rejected by FileMaker during XML/XSL import/processing.

I hope this information is useful to correctly sort out the custom XSL.

Best regards,

Daniel

Share this post


Link to post
Share on other sites
2 hours ago, Quito said:

"...create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType"

That's exactly what I'm wondering how to accomplish.

Try the following stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/PubmedArticleSet">
    <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
        <METADATA>
            <FIELD NAME="PubmedID"/>
            <FIELD NAME="PiiID"/>
            <FIELD NAME="DoiID"/>
            <FIELD NAME="PmcID"/>
        </METADATA>
        <RESULTSET>
            <xsl:for-each select="PubmedArticle">
                <ROW>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='pubmed']"/></DATA></COL>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='pii']"/></DATA></COL>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='doi']"/></DATA></COL>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='pmc']"/></DATA></COL>
                </ROW>
            </xsl:for-each>
        </RESULTSET>
    </FMPXMLRESULT>
</xsl:template>

</xsl:stylesheet>

This will present four source fields to the Filemaker's import dialog. Of course, you will also need to have four corresponding target fields in your solution, so that you can map the imported data to them.

 

2 hours ago, Quito said:

<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2016//EN" "https://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_160101.dtd">

These lines are rejected by FileMaker during XML/XSL import/processing.

Indeed, the DOCTYPE declaration is causing a problem. I am not quite sure why - I suspect it's an OS permissions issue that comes up when the XSLT engine tries to access the online DTD document.

  • Like 1

Share this post


Link to post
Share on other sites

Hi everyone,

i am using fmp 13 to access the pubmed database as well and was happy to see that there are others doing it. i am replying to this thread because my problem is kind of similar.

The thing is, a year ago my setup was working fine: i would generate an url with a formula request xml from pubmed (via the eutils esearch feature) and the resulting xml would be parsed by a custom xslt and the values then would be imported without any problem.

the url would look like this:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&usehistory=y&term=Aortic stenosis [tiab] AND 2016 [dp]

 

Going back to this database today my scripts all gave me the error that a whitespace was expected in line 1 column 50.

i tried opening the pubmed URL in a browser. copied the resulting  content to the text editor and saved it as a utf-8 xml  file. when i select this xml file as the source (instead of the URL) everything is working fine. 

i have no idea why this all worked about 12 month ago and doesn't work anymore. furthermore i would really like to get this working again...

have you had the same problems? any ideas?

i guess maybe pubmed changed something in the encoding, or the upgrade to macOS sierra.

 

Thanks

SearchResultXMLConvert.xsl

Share this post


Link to post
Share on other sites

I don't see what this has to do with the original problem here. And if you get an error when running a script, the smart thing to do is post the exact script you were running, along with the verbatim error message. 

I am guessing you need to percent-encode the spaces in your URL, so that it reads:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&usehistory=y&term=Aortic%20stenosis%20%5Btiab%5D%20AND%202016%20%5Bdp%5D

 

Share this post


Link to post
Share on other sites

Hi Kalle,

Just a few thoughts.  The E-utility Web Service (SOAP) was terminated on July 1, 2015.  See https://www.ncbi.nlm.nih.gov/books/NBK43082/?report=reader.

The new syntax is provided here:

https://www.ncbi.nlm.nih.gov/books/NBK25499/?report=reader#chapter4.Eutility_DTDs

Please check this page regarding the NCBI migration to https on September 30, 2016https://www.ncbi.nlm.nih.gov/home/develop/https-guidance/

If you are interested in fleshing out the complete xsl for FileMaker using the PubMed DTD, let me know.  It's close to 3000 options and I've managed to write the main 500 or so.

Best,

Daniel

  • Like 1

Share this post


Link to post
Share on other sites
On 3.6.2017 at 0:09 PM, comment said:

I don't see what this has to do with the original problem here. And if you get an error when running a script, the smart thing to do is post the exact script you were running, along with the verbatim error message. 

I am guessing you need to percent-encode the spaces in your URL, so that it reads:


https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&usehistory=y&term=Aortic%20stenosis%20%5Btiab%5D%20AND%202016%20%5Bdp%5D

Thanks

5 hours ago, Quito said:

Hi Kalle,

Just a few thoughts.  The E-utility Web Service (SOAP) was terminated on July 1, 2015.  See https://www.ncbi.nlm.nih.gov/books/NBK43082/?report=reader.

The new syntax is provided here:

https://www.ncbi.nlm.nih.gov/books/NBK25499/?report=reader#chapter4.Eutility_DTDs

Please check this page regarding the NCBI migration to https on September 30, 2016https://www.ncbi.nlm.nih.gov/home/develop/https-guidance/

If you are interested in fleshing out the complete xsl for FileMaker using the PubMed DTD, let me know.  It's close to 3000 options and I've managed to write the main 500 or so.

Best,

Daniel

Hi Daniel,

thanks for your reply! I will look into those links. 

Kalle

Share this post


Link to post
Share on other sites

Ok. Problem solved.

It was a very simple problem: As Daniel suggested it had to do with the http versus https request. I changed the URL to https and everything is fine!

Thanks Daniel!

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • By DarioDN
      Dear all, I've just create a script to export my records in multiple xml file (one xml for each record) applying also an XSL stylesheet.
      So, if I use the style.xsl taking from my computer ("source" button when filemaker ask me to select the file) it works. If I put the same file online, it doesn't work giving me a SAXParseException error: internal error in NetAccess (occurred in an unknown entity, at line 0, column 0).
      In the same folder where I have the style.xsl there are other xsl files that works, but they are used to "import" process (from XML to FILEMAKER, while in this case I'm working to an export process, from FILEMAKER to XML, I don't know if this could be the problem)
      Someone could help me to understand the error? Thank you
      <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fmp="http://www.filemaker.com/fmpxmlresult" exclude-result-prefixes="fmp" > <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="fmp:FMPXMLRESULT"> <xsl:for-each select="fmp:RESULTSET/fmp:ROW"> <xsl:text>&#xA;</xsl:text> <xsl:text disable-output-escaping="yes">&lt;!--</xsl:text> <xsl:value-of select="fmp:COL[1]/fmp:DATA"/>.<xsl:value-of select="fmp:COL[2]/fmp:DATA"/>.<xsl:value-of select="fmp:COL[3]/fmp:DATA"/><xsl:text disable-output-escaping="yes">--&gt;</xsl:text> <problem display_name="Checkboxes" > <choiceresponse> <label><xsl:value-of select="fmp:COL[4]/fmp:DATA"/></label> <checkboxgroup> <choice correct="{fmp:COL[6]/fmp:DATA}"> <xsl:value-of select="fmp:COL[5]/fmp:DATA"/> <choicehint selected="true"><xsl:value-of select="fmp:COL[7]/fmp:DATA"/> </choicehint> </choice> <choice correct="{fmp:COL[9]/fmp:DATA}"> <xsl:value-of select="fmp:COL[8]/fmp:DATA"/> <choicehint selected="true"><xsl:value-of select="fmp:COL[10]/fmp:DATA"/></choicehint> </choice> <choice correct="{fmp:COL[12]/fmp:DATA}"> <xsl:value-of select="fmp:COL[11]/fmp:DATA"/> <choicehint selected="true"><xsl:value-of select="fmp:COL[13]/fmp:DATA"/></choicehint> </choice> </checkboxgroup> </choiceresponse> </problem> <xsl:text>&#xA;</xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet>  
    • By Pavlk
      I've attached two files. The first one is INVOICE, that is generated by every establishment when you purchase, and is in xml format.
      The second one is XSLT file, that is provided by government fiscal institution and is used to "translate" xml.
      PROBLEM DESCRIPTION: when I try to decode xml using attached xslt scheme on any online xml-converter website the output looks fine. BUT when importing to Filemaker, I receive an error: "XML parsing error: invalid document structure".
       
      I would appreciate any help with the issue. Thanks.
      XML input.xml
      XSLT scheme.xslt
    • By _Mark__
      Hi all,
      I'm completely new to XML and I'm making an XLST for an XML feed into my Filemaker database.
      Surprisingly, I got the basics down rather fast, but I'm really stumped on a few fields.
      See attached XLST, sample dataset and Filemaker test database.
      Referencing the XSLT metadata, the import works perfectly fine for the nodes named 'titleId' down through 'dealType' as well as the nodes under 'licensingWindow'. However I can't seem to get the values of a few other nodes. The trouble for me is that the nodes are nested in as child nodes that aren't named uniquely and I just can't figure out how to get their values.
      Example:
      <titleListingTextSets>
                  <listingTexts>
                      <listingText>
                          <name>SYNOPSIS75</name>
                          <value>Mickey continues to feel mounting pressure from the network.</value>
                      </listingText>
                      <listingText>
                          <name>SYNOPSIS234</name>
                          <value>Mickey continues to feel mounting pressure from the network as an affiliates dinner is fast approaching and they need something to sell to advertisers.</value>
                      </listingText>
                      <listingText>
      What I'm looking for is the value (text beginning with the word Mickey) of the field 'value' on the node under the ones whose name is 'name' containing the values 'SYNOPSIS75' and 'SYNOPSIS234'. Those values seem to be the only way to uniquely identify the 'value' fields. That's what I'm not getting. I had one try that did correctly identify and pull over the value data but it messed with the overall for-each I need for importing a number of records and thus I only got the data for a single record.
      I'm guessing there is an easy fix for this, but I'm just not familiar enough with XML to know it yet.
      I'd really appreciate it if someone could point me toward the right technique to use on this.
      You'll see in my XLST some of my failed attempts, if you'd like to see what I was trying.
      Thanks in advance,
      Mark
      NO_SYNOPSES_OR_LICENSE.xslt
      XML_TEST.fmp12
      Sample_Data.xml
  • Who Viewed the Topic

    11 members have viewed this topic:
    Robert Mota Hawks  V!ncent  gczychi  arni  Lee Smith  fillmore  Ron Cates  Fitch  Lola  OlgerDiekstra  rwoods 
×

Important Information

By using this site, you agree to our Terms of Use.