Jump to content
Quito

xsl to import multiple attributes within a Pubmed subset

Recommended Posts

Hi FMForums Community,

After exporting PubMed records in XML format (where the default file name is pubmed_result.xml), the first two lines within the file must be removed in order to import pubmed_result.xml into FileMaker (14 Adv, MacOS Sierra) using a custom .xsl file.

Within the ArticleIdList subset there is the ArticleId, with multiple IdType. I am including the relevant XML and XSL portions.

----------------

XML

<PubmedArticleSet>

  <PubmedArticle>

    <PubmedData>

        <ArticleIdList>
            <ArticleId IdType="pubmed">27167443</ArticleId>
            <ArticleId IdType="pii">S1806-37132016000200162</ArticleId>
            <ArticleId IdType="doi">10.1590/S1806-37562016000000114</ArticleId>
            <ArticleId IdType="pmc">PMC4853075</ArticleId>
        </ArticleIdList>

    </PubmedData>
  </PubmedArticle>

</PubmedArticleSet>

------------------

XSL

                        <COL>
                            <DATA>
                        <xsl:for-each select="PubmedData/ArticleIdList">
                            <ArticleId>
                               <xsl:for-each select="*">
                                  <xsl:value-of select="." />
                                  <xsl:if test="position()!=last()">
                                  <xsl:value-of select="'&#xD;'"/>
                                  </xsl:if>
                               </xsl:for-each>
                            </ArticleId>
                        </xsl:for-each>
                            </DATA>
                        </COL>
-------------------

The result I'm getting after importing is:

27167443
S1806-37132016000200162
10.1590/S1806-37562016000000114
PMC4853075

But I would like to include the ArticleId IdType attribute to make it look like this:

pubmed: 27167443
pii: S1806-37132016000200162
doi: 10.1590/S1806-37562016000000114
pmc: PMC4853075

In another topic within fmforums.com there was a suggestion to use @ to capture multiple attributes within a subset.  Something like:

<xsl:value-of select="ArticleId/@IdType" />

What should the xsl contain in order to make pubmed: pii: doi: and pmc: visible?

Many thanks in advance.

Kind regards,

Daniel

Edited by Quito
Clarification

Share this post


Link to post
Share on other sites

I don't think you are approaching this correctly.

If you know in advance which ID types are included (or can be included) in an article, then create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType - not by using xsl:for-each. If you don't know all the possible ID types, then you need to import them as individual records, not fields, into a related table, where each record would have fields for type, value, and some common value tying all IDs of the same article together.

The way you are going about this has two major flaws: first, you will end up with poorly structured data; a field should not be holding both the type and the value together. Also, the number of fields a stylesheet can import is fixed in advance. The fields are defined in the stylesheet's METADATA section - which is sadly missing from your post - and in Filemaker's field import order. You cannot add fields on the fly - that's why it makes very little sense to try and grab all the IDs using xsl:for-each.

 

--
P.S. I did not understand this part:

Quote

the first two lines within the file must be removed in order to import pubmed_result.xml into FileMaker

If the source file is a well-formed XML document, you should be able to import it as is - without having to modify it in any way.

 

Edited by comment

Share this post


Link to post
Share on other sites
9 hours ago, comment said:

I don't think you are approaching this correctly.

If you know in advance which ID types are included (or can be included) in an article, then create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType - not by using xsl:for-each. If you don't know all the possible ID types, then you need to import them as individual records, not fields, into a related table, where each record would have fields for type, value, and some common value tying all IDs of the same article together.

The way you are going about this has two major flaws: first, you will end up with poorly structured data; a field should not be holding both the type and the value together. Also, the number of fields a stylesheet can import is fixed in advance. The fields are defined in the METADATA section, which is sadly missing from your post. You cannot add fields on the fly - that's why it makes very little sense to try and grab all the IDs using xsl:for-each.

 

--
P.S. I did not understand this part:

If the source file is a well-formed XML document, you should be able to import it as is - without having to modify it in any way.

 

Hi,

Thanks for replying promptly.  Answers to your comments follow below:

"The fields are defined in the METADATA section, which is sadly missing from your post."

This is the Metadata:

            <METADATA>
                <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="ArticleID" TYPE="TEXT"/>
            </METADATA>

"If you know in advance which ID types are included (or can be included) in an article..."

A record can include a minimum of 1 (*) and a maximum of 4 IDTypes.

The possible IDTypes are:

pubmed *
pii
doi
pmc
"...create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType"

That's exactly what I'm wondering how to accomplish.

"a field should not be holding both the type and the value together"

Agreed.  It would make sense to have the IDType and it's content populate different fields.  How do I achieve this?

"P.S. I did not understand this part:

If the source file is a well-formed XML document, you should be able to import it as is - without having to modify it in any way."

These are the first two lines from the xml file:

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2016//EN" "https://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_160101.dtd">

These lines are rejected by FileMaker during XML/XSL import/processing.

I hope this information is useful to correctly sort out the custom XSL.

Best regards,

Daniel

Share this post


Link to post
Share on other sites
2 hours ago, Quito said:

"...create a field for each type and populate it explicitly with the value from the ArticleId that has the corresponding IdType"

That's exactly what I'm wondering how to accomplish.

Try the following stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/PubmedArticleSet">
    <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
        <METADATA>
            <FIELD NAME="PubmedID"/>
            <FIELD NAME="PiiID"/>
            <FIELD NAME="DoiID"/>
            <FIELD NAME="PmcID"/>
        </METADATA>
        <RESULTSET>
            <xsl:for-each select="PubmedArticle">
                <ROW>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='pubmed']"/></DATA></COL>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='pii']"/></DATA></COL>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='doi']"/></DATA></COL>
                    <COL><DATA><xsl:value-of select="PubmedData/ArticleIdList/ArticleId[@IdType='pmc']"/></DATA></COL>
                </ROW>
            </xsl:for-each>
        </RESULTSET>
    </FMPXMLRESULT>
</xsl:template>

</xsl:stylesheet>

This will present four source fields to the Filemaker's import dialog. Of course, you will also need to have four corresponding target fields in your solution, so that you can map the imported data to them.

 

2 hours ago, Quito said:

<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2016//EN" "https://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_160101.dtd">

These lines are rejected by FileMaker during XML/XSL import/processing.

Indeed, the DOCTYPE declaration is causing a problem. I am not quite sure why - I suspect it's an OS permissions issue that comes up when the XSLT engine tries to access the online DTD document.

  • Like 1

Share this post


Link to post
Share on other sites

Hi everyone,

i am using fmp 13 to access the pubmed database as well and was happy to see that there are others doing it. i am replying to this thread because my problem is kind of similar.

The thing is, a year ago my setup was working fine: i would generate an url with a formula request xml from pubmed (via the eutils esearch feature) and the resulting xml would be parsed by a custom xslt and the values then would be imported without any problem.

the url would look like this:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&usehistory=y&term=Aortic stenosis [tiab] AND 2016 [dp]

 

Going back to this database today my scripts all gave me the error that a whitespace was expected in line 1 column 50.

i tried opening the pubmed URL in a browser. copied the resulting  content to the text editor and saved it as a utf-8 xml  file. when i select this xml file as the source (instead of the URL) everything is working fine. 

i have no idea why this all worked about 12 month ago and doesn't work anymore. furthermore i would really like to get this working again...

have you had the same problems? any ideas?

i guess maybe pubmed changed something in the encoding, or the upgrade to macOS sierra.

 

Thanks

SearchResultXMLConvert.xsl

Share this post


Link to post
Share on other sites

I don't see what this has to do with the original problem here. And if you get an error when running a script, the smart thing to do is post the exact script you were running, along with the verbatim error message. 

I am guessing you need to percent-encode the spaces in your URL, so that it reads:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&usehistory=y&term=Aortic%20stenosis%20%5Btiab%5D%20AND%202016%20%5Bdp%5D

 

Share this post


Link to post
Share on other sites

Hi Kalle,

Just a few thoughts.  The E-utility Web Service (SOAP) was terminated on July 1, 2015.  See https://www.ncbi.nlm.nih.gov/books/NBK43082/?report=reader.

The new syntax is provided here:

https://www.ncbi.nlm.nih.gov/books/NBK25499/?report=reader#chapter4.Eutility_DTDs

Please check this page regarding the NCBI migration to https on September 30, 2016https://www.ncbi.nlm.nih.gov/home/develop/https-guidance/

If you are interested in fleshing out the complete xsl for FileMaker using the PubMed DTD, let me know.  It's close to 3000 options and I've managed to write the main 500 or so.

Best,

Daniel

  • Like 1

Share this post


Link to post
Share on other sites
On 3.6.2017 at 0:09 PM, comment said:

I don't see what this has to do with the original problem here. And if you get an error when running a script, the smart thing to do is post the exact script you were running, along with the verbatim error message. 

I am guessing you need to percent-encode the spaces in your URL, so that it reads:


https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&usehistory=y&term=Aortic%20stenosis%20%5Btiab%5D%20AND%202016%20%5Bdp%5D

Thanks

5 hours ago, Quito said:

Hi Kalle,

Just a few thoughts.  The E-utility Web Service (SOAP) was terminated on July 1, 2015.  See https://www.ncbi.nlm.nih.gov/books/NBK43082/?report=reader.

The new syntax is provided here:

https://www.ncbi.nlm.nih.gov/books/NBK25499/?report=reader#chapter4.Eutility_DTDs

Please check this page regarding the NCBI migration to https on September 30, 2016https://www.ncbi.nlm.nih.gov/home/develop/https-guidance/

If you are interested in fleshing out the complete xsl for FileMaker using the PubMed DTD, let me know.  It's close to 3000 options and I've managed to write the main 500 or so.

Best,

Daniel

Hi Daniel,

thanks for your reply! I will look into those links. 

Kalle

Share this post


Link to post
Share on other sites

Ok. Problem solved.

It was a very simple problem: As Daniel suggested it had to do with the http versus https request. I changed the URL to https and everything is fine!

Thanks Daniel!

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×

Important Information

By using this site, you agree to our Terms of Use.