Jump to content

Parsing from Web View & Scripting


This topic is 1447 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Pardon my brute force learning methods :)  I'm using this from another thread here since it's pretty much what I'm trying to learn.

My intention here is more to verify what I've got already in my database against the book(s) in my possession. But if there's something missing may as well grab it.  (and FWIW, a lot of their data is hokey anyway, at least for books of 1980s computing, but that's for another forum/topic)

 

I've got some general questions on Set Variable. I'm just not getting whats going on here with the script steps. Why some of these work, some don't.

1) Text leading up to the value you want

2) Text after what you want.

3) Setting up the conditions?

4) .. if #1 & #2 are met, this starts? Otherwise, move along. This is not the text you are looking for.

5-6) ...  doing something with the matched text.. but I'm just not following what is happening here. I have some lines that work, others that don't and I'm not seeing why.

7) Stuff the result in the target field and move along. Got it.

It's probably that I have no idea what's going on after line 2.

 

This one for ISBN13 works. 

# ISBN13
1) Set Variable [ $prefix ; Value: "ISBN13</th> <td>" ]
2) Set Variable [ $suffix ; Value:  "</td>" ]
3) Set Variable [ $start ; Value: Position ( $text ; $prefix ; 1 ; 1 ) ]
4) If [ $start ]
5) Set Variable [ $result ; Value: Let ( [ start = $start + Length ( $prefix ) ; end = Position ( $text ; $suffix ; start ; 1 ) ] ; Middle ( $text ; start ; end - start ) ) ]
6) Set Variable [ $result ; Value: Trim( Substitute( TrimAll($result; 0;0 ) ; [Char(10); ""] ; ["'"; ""] )) ]
7) Set Field [ Table::Zip ; $result ]
8) End If

But changing it slightly for the other ISBN field..  I just can't seem to capture that field. No matter which of these scripts I try.

# ISBN10
Set Variable [ $prefix ; Value: "ISBN</td>" ] 
Set Variable [ $suffix ; Value:  "</td> </tr>" ] 
Set Variable [ $start ; Value: Position ( $text ; $prefix ; 1 ; 1 ) ] 
If [ $start ] 
Set Variable [ $result ; Value: Let ( [ start = $start + Length ( $prefix ) ; end = Position ( $text ; $suffix ; start ; 1 ) ] ; Middle ( $text ; start ; end - start ) ) ] 
Set Variable [ $result ; Value: Trim( Substitute( TrimAll($result; 0;0 ) ; [Char(10); ""] ; ["&#039;"; ""] )) ] 
Set Field [ Table::State ; $result ] 
End If
# 
# ISBN10 Alt
Set Variable [ $prefix ; Value: "<th>ISBN</td> <th>" ] 
Set Variable [ $suffix ; Value:  "</td>" ] 
Set Variable [ $start ; Value: Position ( $text ; $prefix ; 1 ; 1 ) ] 
If [ $start ] 
Set Variable [ $start ; Value: 1+ Position ( $text ; ">" ; $start + Length( $prefix); 1 ) ] 
Set Variable [ $result ; Value: Let ( [ start = $start ; end = Position ( $text ; $suffix ; start ; 1 ) ] ; Middle ( $text ; start ; end - start ) ) ] 
Set Variable [ $result ; Value: Trim( Substitute( TrimAll($result; 0;0 ) ; [Char(10); ""] ; ["&#039;"; ""] )) ] 
Set Field [ Table::State ; $result ] 
End If

Below is from Firefox. Why I can't see the same thing in the Content tab in FM, I have no idea. Voodoo probably.

 <!-- <img src="/sites/default/files/default-book-cover.jpg" style="height:250px; width:190px; background-color:#dddddd"/> -->
                <object height="250px" width="190px" data="https://images.isbndb.com/covers/74/35/9780201177435.jpg" type="image/png">
                 <img height="250px" width="190px" src="/modules/isbndb/img/default-book-cover.jpg" />
                </object>
            </div>
            <div class="book-table col-xs-12 col-md-6">
              <table class="table table-hover table-responsive ">
                                <tr> <th>Full Title</th> <td>Apple Iigs Hardware Reference</td> </tr>
                                                <tr> <th>ISBN</td> <th>0201177439</td> </tr>
                                                <tr> <th>ISBN13</th> <td>9780201177435</td> </tr>
                                                <tr> <th>List Price</th> <td>USD $24.95</td> </tr>
                                                <tr> <th>Publisher</th> <td><a href="/publisher/Longman Pub Group">Longman Pub Group</a></td> </tr>
                                                <tr> <th>Authors</th> <td>
                                    <a href="/author/Inc. Apple Computer">Inc. Apple Computer</a><br />
                                </td> </tr>
                                
                
                                <tr> <th>Edition</th> <td>1</td> </tr>
                                                <tr> <th>Publish Date</th> <td>1987</td> </tr>
                                                <tr> <th>Binding</th> <td>Hardcover</td> </tr>

 

parsing and scraping MOD ISBN.fmp12.zip

Link to comment
Share on other sites

First of all, web scraping is for the dogs. You are at the mercy of the web page's author, and even an addition of an insignificant space will break your code. 

Now, I took a quick look at your file. I see that the ISBNs are on these two lines:

                                                <tr> <th>ISBN </th><th>0830631291 </th></tr>
                                                <tr> <th>ISBN13</th> <td>9780830631292</td> </tr>

Your script says:

Set Variable [ $prefix ; Value: "ISBN</td>" ] 

but the ISBN 10 value is preceded by:

"ISBN </th><th>"

so that's already not working. I did not check the rest.

Next you say that the code you get in Firefox is different. I verified that and it is true: the web site returns a different content when the browser is Firefox (or Safari). Which brings me back to my first point.

So what exactly was your question?

 

Edited by comment
Link to comment
Share on other sites

Yes, I know the evils of scraping and the mercy of the operator.. 

 

ISBN13 works.

The one for ISBN does not. I can use the exact same statement and change the prefix to match the ISBN13 and it works.

There's a couple there, I had them en/disabled individually, I left it like that for the copy/paste. Different ways of trying it.  No matter what, I can't seem to capture that

Firefox and Filemaker show slightly different page source for that table, why? I don't know. The other entries work straight up. That's what I'm not getting here. Actually understanding all that syntax in the Set Variable lines is would probably be a good start. :)

 


            <div class="book-table col-xs-12 col-md-6">
              <table class="table table-hover table-responsive ">
                                <tbody><tr> <th>Full Title</th> <td>Apple Iigs Hardware Reference</td> </tr>
                                                <tr> <th>ISBN </th><th>0201177439 </th></tr>
                                                <tr> <th>ISBN13</th> <td>9780201177435</td> </tr>
                                                <tr> <th>List Price</th> <td>USD $24.95</td> </tr>
                                                <tr> <th>Publisher</th> <td><a href="/publisher/Longman Pub Group">Longman Pub Group</a></td> </tr>
                                                <tr> <th>Authors</th> <td>
                                    <a href="/author/Inc. Apple Computer">Inc. Apple Computer</a><br>
                                </td> </tr>
                                
                
                                <

 

Link to comment
Share on other sites

8 minutes ago, Tony Diaz said:

Firefox and Filemaker show slightly different page source for that table, why?

It is not uncommon for web sites to return different content to different browsers. The most obvious example is mobile browsers (that are often redirected to an entirely different page), but there are other differences that a web site might want to take into account.

BTW, it is interesting to note that the code returned for Firefox and Safari is actually wrong. Both:

<th>ISBN</td>

and:

<th>0201177439</td>

have unmatched start and end tags.

 

7 minutes ago, Tony Diaz said:

Actually understanding all that syntax in the Set Variable lines is would probably be a good start.

It's actually quite simple: first you find the position of the prefix; then you find the position of the suffix; and finally you extract the text between the end of the prefix and the start of the suffix. Load this into your data viewer and break it apart to see how it works:

Let ( [
text = "some text that contains an important message ending here." ; 
prefix = "important " ;
suffix = " ending" ; 
start = Position ( text ; prefix ; 1 ; 1 ) + Length ( prefix ) ;
end = Position ( text ; suffix ; start ; 1 )
] ;
Middle ( text ; start ; end - start )
)

This is also to show that you don't need all those SetVariable steps; it can be all done within a single Let() statement.

 

  • Like 1
Link to comment
Share on other sites

So, that would be doing it as a function vs. a script.  Position ( table::field ; prefix..  How would it get at the web viewer data then?

(Another area I need to work on more, script, vs function vs calculation)

3 hours ago, comment said:

BTW, it is interesting to note that the code returned for Firefox and Safari is actually wrong. Both:

... and -that- explains it.  I was wondering why those tags were flipped around. I kept looking back and forth at that stuff. The source is easier to read on FireFox. All the other stuff comes across the same. Just not that line. Otherwise it works now.

 

Edited by Tony Diaz
Link to comment
Share on other sites

I would still keep it as a script, just reduce it to something like:

Set Variable [ $html ; GetLayoutObjectAttribute ( "myWebwiewer" ; "content" ) ]
Set Field [ Books::Title ; Let ( ... ) ]
Set Field [ Books::ISBN10 ; Let ( ... ) ]
Set Field [ Books::ISBN13 ; Let ( ... ) ]
# ...

Or even better, eliminate the web viewer and use the Insert from URL script step to populate the $html variable. Just check which version you get from this call.

 

Link to comment
Share on other sites

Ah, I see now. The Let part is being done in as one of the parameters, so you use the calculation editor to put that in just like I'm doing the simple bits of prefix/suffix now., but use the actual editor instead of the dialog box. D'oh!

I see now. :) a lot less voodoo this way too.

Link to comment
Share on other sites

This topic is 1447 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.