tv_kid Posted June 8, 2006 Posted June 8, 2006 Is it possible to parse the contents of a PDF document? Alternatively is it possible to 'grab' data from a website that is displayed from another database (i.e not as HTML). Obviously I could simply cut and paste the info into a text file and then parse it from there, but I wondered if anyone had any other ideas? Thanks in advance.
Fenton Posted June 8, 2006 Posted June 8, 2006 I don't know much about PDF. There are applications that deal with them that are AppleScriptable. PDFOpen comes to mind. There are others which are dedicated to getting the contents as text, such as TextLightning. Trapeze (don't know if they're AppleScriptable). There are probably other geekier options also. Reading from a web page is easy. You can use AppleScript and Safari. tell application "Safari" source of document 1 -- or (don't use both -) text of document 1 end tell Or you can use Unix command line to get the source: do shell script "curl 'http://www.fentonjones.com'" without altering line endings You'd use that if you wanted to continue parsing the source text with further Unix commands, such as 'grep' and 'cut', and 'sed'. All of which are confusing to use, but powerful. Open Terminal and type: man grep Usually it's more useful to get the entire source code, rather than just the "text" of the web page, because the source has all the html code, which you might need to identify exactly what you want to extract.
tv_kid Posted June 8, 2006 Author Posted June 8, 2006 tell application "Safari" source of document 1 -- or (don't use both -) text of document 1 end tell In this case the text of the document was what I needed, as the web page is displaying info from another database, so doesn't show up in the 'source'. But it worked perfectly, and a small script cleared out the info I don't need and parsed the rest to relevant fields. Many thanks once again!
Recommended Posts
This topic is 6743 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now