Get Url Link

Rob Pritts · October 27, 2012

I am pulling down the html text in to a file and i want to pull out the links

I saw get image URLs

My end goal

monitor member sites for changes daily.

When the change happen notify different members in their network so that the change can be noted on their sites also

Ie a link and teaser Text

Rob

fseipel · October 27, 2012

I'd suggest using an HTML parser such as jsoup, you can then retrieve the image link list easily, e.g.

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.jsoup.select.Elements;

String results = "";

String url = 'http://www.amazon.com';

Document doc = Jsoup.connect(url).get();

Elements media = doc.select("[src]");

for (Element src : media) {

if (src.tagName().equals("img"))

results = results + src.tagName() + ':' + src.attr("abs:src") + 'n';

}

return results;

Rob Pritts · October 27, 2012

Is that a script? or Plug in?

Ocean West · October 27, 2012

looks like groovy code for use with ScriptMaster (a plugin)

Rob Pritts · October 27, 2012

I agree but want o know how

john renfrew · October 28, 2012

@fseipel - thanks for the pointer, that looks a great library.

@rob pritts - do you want to 'learn' how to take advantage of the pointer, or just have someone give you a solution to a workflow that is once outlined by a sketch?

If the first

Go and find jsoup and download the jar.

Read the documentation, and particularly the cookbook examples, and you will find that fseipel has simplified it already for you for the case you outlined.

Import the jar to you SM demo file and create a function using the code

Test it with some real urls

Make it a registered function following one of the methods outlined by 360works

Integrate the results into a FileMaker workflow to achieve your outlined expectation.

As a side point. Can you explain how you intend to define if an image is 'new' if it is uploaded with the same name as one from yesterday??

Rob Pritts · October 28, 2012

I would love to have some one do it at one point but also need to learn.

I am looking for content changes. but was saw it could get images from SM so was pointing it out

Thanks

fseipel · October 30, 2012

@john: Thanks for fleshing out the code example.

In past I've also used Filemaker's string functions to parse HTML, but the Java libraries seem like a potentially better choice, less likely to break, and more readable/maintainable.

I don't see how the OP will be able to tell if an image has changed, short of downloading it and doing a byte comparison against the last downloaded copy, or at least a file byte length comparison (less reliable). If it has to do that over a large number of users/pages, it may be quite slow and a bandwidth hog. If the site(s) offer web services, that may be a much better alternative for data acquisition. I interact a lot with amazon.com, and I always use the web services, except for data which isn't provided by the web services. Screen scraping, in contrast, is slower, and more prone to breaking.

Sign In

Get Url Link

Recommended Posts

Rob Pritts

fseipel

Rob Pritts

Ocean West

Rob Pritts

john renfrew

Rob Pritts

fseipel

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information