January 27, 200917 yr I have exported the html contents of a webviewer as text to a text field in another layout. I am trying to write a script to go through the text and extract urls from within it. All urls in the text begin http and the string is contained within " " I am running a looping script that goes to the first word, if its first 4 characters are http (Left ( $thisword ; 4 ) = "http") then i set the current word plus the next 20 to another variable which is exported to another field. The current word then increments by +1 and stops when current word = last word. This is ugly! I wonder if anyone can help me get the exact text of the url contained in the " " separators. Any help much appreciated! Basically, how do i exit a loop if the current word is followed by a " separator.... Edited January 27, 200917 yr by Guest
January 27, 200917 yr Why don't you search for the n-th occurence of "http", extract from there till the first following quote, then bump n up by 1.
January 27, 200917 yr Author sorry, I've never used these functions, would you mind elaborating? An example would be superb! Thanks for the input....
January 27, 200917 yr Roughly: Loop Set Variable [ $i ; $i + 1 ] Exit Loop If [ $i > PatternCount ( text ; "http" ) ] SetVariable [ $url ; <> ] Peform Script [ New URL Record ; parameter: $url ] End Loop and the <> would be: Let ( [ start = Position ( text ; "http" ; 1 ; $i ) ; end = Position ( text ; """ ; start ; 1 ) ] ; Middle ( text ; start ; end - start ) )
March 16, 200916 yr Newbies To extract URLs from a html, asp, php, text, etc. documents, there is a good script posted at http://www.biterscripting.com/SS_URLs.html . To use, do the following. (With high speed internet, this entire process, including installation, should take no more than a couple of minutes.) 1. Download and install biterscripting at http://www.biterscripting.com . 2. Start biterscripting and enter the following command . script "http://www.biterscripting.com/Download/SS_AllSamples.txt" (biterscripting can execute scripts directly from a web site) 3. Now you are ready to use the SS_URLs script to extract URLs. This is done with the following command. script "C:/Scripts/SS_URLs.txt" URL("http://....") The above will extract URLs referenced in that web page. OR, script "C:/Scripts/SS_URLs.txt" URL("C:/....") The above will extract URLs referenced in that local file. Hope this helps. Patrick
Create an account or sign in to comment