Jump to content

extract url from html body


This topic is 5510 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have exported the html contents of a webviewer as text to a text field in another layout.

I am trying to write a script to go through the text and extract urls from within it.

All urls in the text begin http and the string is contained within " "

I am running a looping script that goes to the first word, if its first 4 characters are http (Left ( $thisword ; 4 ) = "http") then i set the current word plus the next 20 to another variable which is exported to another field.

The current word then increments by +1 and stops when current word = last word.

This is ugly! I wonder if anyone can help me get the exact text of the url contained in the " " separators.

Any help much appreciated!

Basically, how do i exit a loop if the current word is followed by a " separator....

Edited by Guest
Link to comment
Share on other sites

Roughly:

Loop

Set Variable [ $i ; $i + 1 ]

Exit Loop If [ $i > PatternCount ( text ; "http" ) ]

SetVariable [ $url ; <> ]

Peform Script [ New URL Record ; parameter: $url ]

End Loop

and the <> would be:

Let ( [

start = Position ( text ; "http" ; 1 ; $i ) ;

end = Position ( text ; """ ; start ; 1 )

] ;

Middle ( text ; start ; end - start )

)

Link to comment
Share on other sites

  • 1 month later...
  • Newbies

To extract URLs from a html, asp, php, text, etc. documents, there is a good script posted at http://www.biterscripting.com/SS_URLs.html .

To use, do the following. (With high speed internet, this entire process, including installation, should take no more than a couple of minutes.)

1. Download and install biterscripting at http://www.biterscripting.com .

2. Start biterscripting and enter the following command .

script "http://www.biterscripting.com/Download/SS_AllSamples.txt"




(biterscripting can execute scripts directly from a web site)



3. Now you are ready to use the SS_URLs script to extract URLs. This is done with the following command.




script "C:/Scripts/SS_URLs.txt" URL("http://....")




The above will extract URLs referenced in that web page. OR,




script "C:/Scripts/SS_URLs.txt" URL("C:/....")

The above will extract URLs referenced in that local file.

Hope this helps.

Patrick

Link to comment
Share on other sites

This topic is 5510 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.