Jump to content

Parse PDF


tv_kid
 Share

This topic is 5650 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Is it possible to parse the contents of a PDF document? Alternatively is it possible to 'grab' data from a website that is displayed from another database (i.e not as HTML). Obviously I could simply cut and paste the info into a text file and then parse it from there, but I wondered if anyone had any other ideas?

Thanks in advance.

Link to comment
Share on other sites

I don't know much about PDF. There are applications that deal with them that are AppleScriptable. PDFOpen comes to mind. There are others which are dedicated to getting the contents as text, such as TextLightning. Trapeze (don't know if they're AppleScriptable). There are probably other geekier options also.

Reading from a web page is easy. You can use AppleScript and Safari.

tell application "Safari"

source of document 1

-- or (don't use both :P-)

text of document 1

end tell

Or you can use Unix command line to get the source:

do shell script "curl 'http://www.fentonjones.com'" without altering line endings

You'd use that if you wanted to continue parsing the source text with further Unix commands, such as 'grep' and 'cut', and 'sed'. All of which are confusing to use, but powerful. Open Terminal and type:

man grep

Usually it's more useful to get the entire source code, rather than just the "text" of the web page, because the source has all the html code, which you might need to identify exactly what you want to extract.

Link to comment
Share on other sites

tell application "Safari"

source of document 1

-- or (don't use both :P-)

text of document 1

end tell

In this case the text of the document was what I needed, as the web page is displaying info from another database, so doesn't show up in the 'source'. But it worked perfectly, and a small script cleared out the info I don't need and parsed the rest to relevant fields.

Many thanks once again!

Link to comment
Share on other sites

This topic is 5650 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.