October 27, 201213 yr Newbies I am pulling down the html text in to a file and i want to pull out the links I saw get image URLs My end goal monitor member sites for changes daily. When the change happen notify different members in their network so that the change can be noted on their sites also Ie a link and teaser Text Rob
October 27, 201213 yr I'd suggest using an HTML parser such as jsoup, you can then retrieve the image link list easily, e.g. import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; String results = ""; String url = 'http://www.amazon.com'; Document doc = Jsoup.connect(url).get(); Elements media = doc.select("[src]"); for (Element src : media) { if (src.tagName().equals("img")) results = results + src.tagName() + ':' + src.attr("abs:src") + 'n'; } return results;
October 28, 201213 yr @fseipel - thanks for the pointer, that looks a great library. @rob pritts - do you want to 'learn' how to take advantage of the pointer, or just have someone give you a solution to a workflow that is once outlined by a sketch? If the first Go and find jsoup and download the jar. Read the documentation, and particularly the cookbook examples, and you will find that fseipel has simplified it already for you for the case you outlined. Import the jar to you SM demo file and create a function using the code Test it with some real urls Make it a registered function following one of the methods outlined by 360works Integrate the results into a FileMaker workflow to achieve your outlined expectation. As a side point. Can you explain how you intend to define if an image is 'new' if it is uploaded with the same name as one from yesterday??
October 28, 201213 yr Author Newbies I would love to have some one do it at one point but also need to learn. I am looking for content changes. but was saw it could get images from SM so was pointing it out Thanks
October 30, 201213 yr @john: Thanks for fleshing out the code example. In past I've also used Filemaker's string functions to parse HTML, but the Java libraries seem like a potentially better choice, less likely to break, and more readable/maintainable. I don't see how the OP will be able to tell if an image has changed, short of downloading it and doing a byte comparison against the last downloaded copy, or at least a file byte length comparison (less reliable). If it has to do that over a large number of users/pages, it may be quite slow and a bandwidth hog. If the site(s) offer web services, that may be a much better alternative for data acquisition. I interact a lot with amazon.com, and I always use the web services, except for data which isn't provided by the web services. Screen scraping, in contrast, is slower, and more prone to breaking.
Create an account or sign in to comment