Newbies Rob Pritts Posted October 27, 2012 Newbies Posted October 27, 2012 I am pulling down the html text in to a file and i want to pull out the links I saw get image URLs My end goal monitor member sites for changes daily. When the change happen notify different members in their network so that the change can be noted on their sites also Ie a link and teaser Text Rob
fseipel Posted October 27, 2012 Posted October 27, 2012 I'd suggest using an HTML parser such as jsoup, you can then retrieve the image link list easily, e.g. import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; String results = ""; String url = 'http://www.amazon.com'; Document doc = Jsoup.connect(url).get(); Elements media = doc.select("[src]"); for (Element src : media) { if (src.tagName().equals("img")) results = results + src.tagName() + ':' + src.attr("abs:src") + 'n'; } return results;
Newbies Rob Pritts Posted October 27, 2012 Author Newbies Posted October 27, 2012 Is that a script? or Plug in?
Ocean West Posted October 27, 2012 Posted October 27, 2012 looks like groovy code for use with ScriptMaster (a plugin)
Newbies Rob Pritts Posted October 27, 2012 Author Newbies Posted October 27, 2012 I agree but want o know how
john renfrew Posted October 28, 2012 Posted October 28, 2012 @fseipel - thanks for the pointer, that looks a great library. @rob pritts - do you want to 'learn' how to take advantage of the pointer, or just have someone give you a solution to a workflow that is once outlined by a sketch? If the first Go and find jsoup and download the jar. Read the documentation, and particularly the cookbook examples, and you will find that fseipel has simplified it already for you for the case you outlined. Import the jar to you SM demo file and create a function using the code Test it with some real urls Make it a registered function following one of the methods outlined by 360works Integrate the results into a FileMaker workflow to achieve your outlined expectation. As a side point. Can you explain how you intend to define if an image is 'new' if it is uploaded with the same name as one from yesterday??
Newbies Rob Pritts Posted October 28, 2012 Author Newbies Posted October 28, 2012 I would love to have some one do it at one point but also need to learn. I am looking for content changes. but was saw it could get images from SM so was pointing it out Thanks
fseipel Posted October 30, 2012 Posted October 30, 2012 @john: Thanks for fleshing out the code example. In past I've also used Filemaker's string functions to parse HTML, but the Java libraries seem like a potentially better choice, less likely to break, and more readable/maintainable. I don't see how the OP will be able to tell if an image has changed, short of downloading it and doing a byte comparison against the last downloaded copy, or at least a file byte length comparison (less reliable). If it has to do that over a large number of users/pages, it may be quite slow and a bandwidth hog. If the site(s) offer web services, that may be a much better alternative for data acquisition. I interact a lot with amazon.com, and I always use the web services, except for data which isn't provided by the web services. Screen scraping, in contrast, is slower, and more prone to breaking.
Recommended Posts
This topic is 4476 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now