greenfields Posted January 27, 2009 Posted January 27, 2009 (edited) I have exported the html contents of a webviewer as text to a text field in another layout. I am trying to write a script to go through the text and extract urls from within it. All urls in the text begin http and the string is contained within " " I am running a looping script that goes to the first word, if its first 4 characters are http (Left ( $thisword ; 4 ) = "http") then i set the current word plus the next 20 to another variable which is exported to another field. The current word then increments by +1 and stops when current word = last word. This is ugly! I wonder if anyone can help me get the exact text of the url contained in the " " separators. Any help much appreciated! Basically, how do i exit a loop if the current word is followed by a " separator.... Edited January 27, 2009 by Guest
comment Posted January 27, 2009 Posted January 27, 2009 Why don't you search for the n-th occurence of "http", extract from there till the first following quote, then bump n up by 1.
greenfields Posted January 27, 2009 Author Posted January 27, 2009 sorry, I've never used these functions, would you mind elaborating? An example would be superb! Thanks for the input....
comment Posted January 27, 2009 Posted January 27, 2009 Roughly: Loop Set Variable [ $i ; $i + 1 ] Exit Loop If [ $i > PatternCount ( text ; "http" ) ] SetVariable [ $url ; <> ] Peform Script [ New URL Record ; parameter: $url ] End Loop and the <> would be: Let ( [ start = Position ( text ; "http" ; 1 ; $i ) ; end = Position ( text ; """ ; start ; 1 ) ] ; Middle ( text ; start ; end - start ) )
Newbies PatrickMc Posted March 16, 2009 Newbies Posted March 16, 2009 To extract URLs from a html, asp, php, text, etc. documents, there is a good script posted at http://www.biterscripting.com/SS_URLs.html . To use, do the following. (With high speed internet, this entire process, including installation, should take no more than a couple of minutes.) 1. Download and install biterscripting at http://www.biterscripting.com . 2. Start biterscripting and enter the following command . script "http://www.biterscripting.com/Download/SS_AllSamples.txt" (biterscripting can execute scripts directly from a web site) 3. Now you are ready to use the SS_URLs script to extract URLs. This is done with the following command. script "C:/Scripts/SS_URLs.txt" URL("http://....") The above will extract URLs referenced in that web page. OR, script "C:/Scripts/SS_URLs.txt" URL("C:/....") The above will extract URLs referenced in that local file. Hope this helps. Patrick
Fenton Posted March 16, 2009 Posted March 16, 2009 Yet another AppleScript solution, uses Perl Extract_URLs_http_from_URL.scpt.zip
Recommended Posts
This topic is 5732 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now