November 29, 201510 yr I am trying to do some web scraping from the Yahoo! Finance site, the content I get back says the document has moved "The document has moved <A HREF="http://finance.yahoo.com/q?p..." etc with Insert from URL. If I use the same URL in Open URL, that I put in the Insert from URL function, the site is opened perfectly. Do some sites not work with Insert From URL? When I was testing the script it seemed like I was able to grab the source at one time, is the Yahoo URL dynamic? Thanks
November 29, 201510 yr 4 hours ago, laguna92651 said: is the Yahoo URL dynamic? If you don't provide the URL, how are we supposed to know? 4 hours ago, laguna92651 said: Do some sites not work with Insert From URL? All sites "work" with Insert From URL. But web scraping using Insert From URL will not work with all sites. All that the step does is insert the HTML code of the linked page. If the page redirects, then you will end up with a field containing the redirecting code.
November 29, 201510 yr Author Here is the link to the site. http://finance.yahoo.com/q;_ylt=AtSfJUysKYLIoHTgg2pQ1fEgBrgF;_ylc=X1MDMjE0MjQ3ODk0OARfcgMyBGZyA3VoM19maW5hbmNlX3dlYl9ncwRmcjIDc2EtZ3AEZ3ByaWQDBG5fZ3BzAzEwBG9yaWdpbgNmaW5hbmNlLnlhaG9vLmNvbQRwb3MDMQRwcXN0cgMEcXVlcnkDXkdTUEMsBHNhYwMxBHNhbwMx?p=http%3A%2F%2Ffinance.yahoo.com%2Fq%3Fs%3D^GSPC%26ql%3D0&uhb=uhb2&fr=uh3_finance_vert_gs&s=^GSPC Thanks you
November 29, 201510 yr Well, this is "interesting". If I run cURL with the above URL, I get the expected page. However, If I run the same URL inside the BE_GetURL() external function (using the BaseElements plugin), I get the "The document has moved ... " message - although, according to the documentation, this function uses the cURL library. I don't know what causes the differences in response. I do, however, have a suggestion: try to get your data through an API, if at all possible, and use web scraping only as the last resort, when no API is available. --- BTW, I seem to get the same page using only http://finance.yahoo.com/q?s=^GSPC - and this works the same with both methods. Edited November 29, 201510 yr by comment
November 29, 201510 yr Author Can you point me to some information on how I would get the data with an API?
November 29, 201510 yr It's not fair to ask me to do your Googling for you. Still, I believe this could prove interesting:http://thesimplesynthesis.com/article/finance-apis#yahoo-yql-finance-api
November 29, 201510 yr A response "The document has moved ... " is a server generated http error (302) and indicates that the document that used to be on that url has moved to a new url. The server generally includes that new url and most browsers will use that to load from there instead. Whether or not the url is dynamic is a question better asked Yahoo.
November 30, 201510 yr Author I didn;t mean for you to google it, I thought you just might have something handy. I'm not even sure I would know what to google for, other than Filemaker API. Thanks for you help I appreciate it.
November 30, 201510 yr 31 minutes ago, laguna92651 said: I'm not even sure I would know what to google for, other than Filemaker API. No, it's the third-party data you want to get, and the third party API you want to get it from. Hint: if their API can provide an XML response, Filemaker can import it directly (with the help of an XSLT sylesheet): http://www.filemaker.com/help/14/fmp/en/html/import_export.18.17.html#1041831
Create an account or sign in to comment