Jump to content

HTML in Filemaker does not match Netscape


ddinisco

This topic is 5907 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I am using the function GetLayoutObjectAttribute ( "wv" ; "content" ) to try to parse info out of a webpage. However when I use this function the resulting html code appears a bit differently compared to when you simply do a copy paste out of the html source tab in Netscape Composer. Basically a lot (but not all) of the carriage returns are missing. Example below.

NETSCAPE


Cast

(Cast overview,

first billed only)

Link to comment
Share on other sites

It is likely that FileMaker doesn't recognize the whitespace line endings. Perhaps they are Unix, ASCII 10. They are not considered important to web browser engines, which are whitespace agnostic.

What exactly are you doing with the source code after putting it in a FileMaker field? Perhaps there is a better way to get it. If you use AppleScript and shell script you can very quickly get the source code of a web page. Run this in Script Editor:


do shell script "curl 'http://fentonjones.com'"

In my experience this is faster and more reliable method to get the html code.

You can set this into a FileMaker field. Or, you could use further AppleScript or command line tools to parse the text. In the latter case you need to force the text into Unix line endings. The do shell script command in AppleScript (which is also line-ending agnostic) is made to be more compatible with old-style Mac returns, whereas the Unix commands require Unix line endings. Let me know if you want more info on this.

Link to comment
Share on other sites

Thanks for the response Fenton. I am not familiar with the curl command.

In short I am using FM to find a specific movie on IMDB and parse out the actors for the film. I was using the grep command to pullout the html lines containing the names, and FM to pullout just the names from those lines.

Problem is what I mentioned before that there are no carriage returns when FM pulls out the code using GetLayoutObjectAttribute ( "wv" ; "content" ).

Any help would be much appreciated if you have time.

Thanks

David

Link to comment
Share on other sites

"curl" is a Unix command to return the source of a URL, to the standard output. It is sort of like getting the source of a Web Object, but much faster. It works well with grep.

I use within AppleScript, as I can run that directly from a Perform Applescript step, and can then set the results into Filemaker fields. There are some caveats when working with Unix commands within Applescript (AS) however.

The "do shell script" command runs Unix command line within AS. By default it returns old-style Mac returns, for compatibility's sake I imagine. But you must have Unix line endings to use grep (or other Unix tools). There is a way to do this, the "without altering line endings" option. Examples (run in Script Editor):)

set web_txt to do shell script "curl 'http://imdb.com/find?s=all&q=Daywatch&x=18&y=5'" without altering line endings


But an even better way, when you're trying to parse text, is to the use the command: strings

It coerces text to Unix line endings, and removes extra lines.




set web_txt to do shell script "curl 'http://imdb.com/find?s=all&q=Daywatch&x=18&y=5' | strings | grep 'Daywatch'"

Link to comment
Share on other sites

This topic is 5907 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.