extract url from html body

January 27, 200917 yr

I have exported the html contents of a webviewer as text to a text field in another layout.

I am trying to write a script to go through the text and extract urls from within it.

All urls in the text begin http and the string is contained within " "

I am running a looping script that goes to the first word, if its first 4 characters are http (Left ( $thisword ; 4 ) = "http") then i set the current word plus the next 20 to another variable which is exported to another field.

The current word then increments by +1 and stops when current word = last word.

This is ugly! I wonder if anyone can help me get the exact text of the url contained in the " " separators.

Any help much appreciated!

Basically, how do i exit a loop if the current word is followed by a " separator....

Edited January 27, 200917 yr by Guest

January 27, 200917 yr

Why don't you search for the n-th occurence of "http", extract from there till the first following quote, then bump n up by 1.

January 27, 200917 yr

Author

sorry, I've never used these functions, would you mind elaborating?

An example would be superb!

Thanks for the input....

January 27, 200917 yr

Roughly:

Loop

Set Variable [ $i ; $i + 1 ]

Exit Loop If [ $i > PatternCount ( text ; "http" ) ]

SetVariable [ $url ; <> ]

Peform Script [ New URL Record ; parameter: $url ]

End Loop

and the <> would be:

Let ( [

start = Position ( text ; "http" ; 1 ; $i ) ;

end = Position ( text ; """ ; start ; 1 )

] ;

Middle ( text ; start ; end - start )

)

March 16, 200917 yr

Newbies

To extract URLs from a html, asp, php, text, etc. documents, there is a good script posted at http://www.biterscripting.com/SS_URLs.html .

To use, do the following. (With high speed internet, this entire process, including installation, should take no more than a couple of minutes.)

1. Download and install biterscripting at http://www.biterscripting.com .

2. Start biterscripting and enter the following command .

script "http://www.biterscripting.com/Download/SS_AllSamples.txt"





(biterscripting can execute scripts directly from a web site)



3. Now you are ready to use the SS_URLs script to extract URLs. This is done with the following command.

script "C:/Scripts/SS_URLs.txt" URL("http://....")





The above will extract URLs referenced in that web page. OR,

script "C:/Scripts/SS_URLs.txt" URL("C:/....")

The above will extract URLs referenced in that local file.

Hope this helps.

Patrick

March 16, 200917 yr

Yet another AppleScript solution, uses Perl

Extract_URLs_http_from_URL.scpt.zip

extract url from html body

Featured Replies

Create an account or sign in to comment

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)