importing records from html database?

alexandra_l · May 27, 2004

hello everyone,

i would like to import the records from this database website:

http://improvdb.fr.st/

into my own fm pro database.

can someone please point me in the right direction on how to do this? in the help files i've seen only html exporting mentioned.

ideally the records of the website would be directly imported into my existing contact database.

many thanks in advance

alexandra

Fenton · May 27, 2004

Basically you're going to have to get the data as text then "parse" into a form that can be set into your records. There is no such thing as direct HTML import; because HTML is just a bunch of text and markup tags. The only order to is whatever the author has used.

Fortunately in most cases there will be a table which has the data you want, with uniform HTML tags in between the data.

You didn't say what platform you're on. On Macs you can use AppleScript with the Safari browser to get the "text" directly:

tell application "Safari"

get text of document 1

end tell

Simple, yes, but you generally end up with something the leaves much to be desired from a FileMaker viewpoint. An example from that site:

California

Add a venue / an organisation in California

Venues / Organisations

Type

Cities

Beanbender's

Berkeley

Jazz House, The

Association

Berkeley

The data is there, but in a "return-delimited" format. You would run some tool to "munge" into tab-delimited; not too difficult in this case: double returns separate "records," single returns separate "fields."

For a Windows or cross-platform solution you would undoubtedly have to get the entire source of the document, including HTML tags, and parse from there. The Troi URL plug-in would get the source text, http://www.troi.com/software/urlplugin.html

"Gets the raw data of the specified URL. This can be for example the HTML of a web page"

It would have a lot of extra stuff, but the data you want would be in a table with uniform separators between the cells. It's then a text parsing job.

alexandra_l · May 28, 2004

fenton, thank you very much.

i'm using os x panther.

i was hoping there might be a solution that would treat a page like this one:

http://www.lequanninh.net/improvdb/?fiche=ok&id_pays=28&id_lieu=299&tri=villes.ville_en&lang=en

as something with field names and field data, so that the website's "field names" could be matched with the ones in my fm database.

the simple applescript seems identical to doing "select all" and "copy" in safari, yes?

i'm not too familiar with these things, and i don't know what my php/mysql is, as it says on the bottom of that page, but i was hoping a database website like this one could be more or less automatically imported into fm pro.

alexandra

Fenton · May 28, 2004

Yes, tell app "Safari" to get text of document 1 is pretty much the same as Select All, Copy.

I wouldn't really know how to "automate" getting all this information. It seems to me pretty much a text-editing job.

I would use a combination of (mostly) AppleScript and FileMaker text functions. In AppleScript I'd use the free Satimage OSAX (Scripting Addition) to enable "grep" search and replace.

http://www.satimage.fr/software/en/downloads_osaxen.html

You should be able to get the text of a single page, clean and arrange it to match a FileMaker file, and set its fields, all with one command.

I can see one level of automation. Which would be to step through the "many" page, i.e., the page with 30 entries for California, get each of those individual entries, and get their detail pages. This would be slightly different than the text.

In this case you'd get the "source" of the page, which has the links. Then parse through the table of links. Each one is only partial; it's missing its base reference:

http://www.lequanninh.net/improvdb/

This is what's in the cells of the table:

?fiche=ok&id_pays=28&id_lieu=299&tri=villes.ville_en&lang=en

Put them together and you've got the full link to go to the detail pages.

http://www.lequanninh.net/improvdb/?fiche=ok&id_pays=28&id_lieu=299&tri=villes.ville_en&lang=en

The source text is messier than the plain text. But the cell tags are consistent (though kind of clunky). If you get the whole table, then remove the tags, you'll have what you want.

Yeah, it's a fair amount of work, and yes, it may need tweaking if they change the code. But I doubt if they will. It's pretty simple, as web pages go these days.

And, yes, a web expert may have a better method. I'm more a brute force text-editing FileMaker AppleScript kind of guy.

I've used this method to get a page of messages off this web forum into a FileMaker database as separate records, all fields intact, a 1-step process.

alexandra_l · May 31, 2004

thanks again, fenton. i will give this a try.

The Shadow · May 31, 2004

I've added some custom functions to help with this in my style-manipulation example:

http://www.spf-15.com/fmExamples/

Change to the layout "webPage", and see your table's body extracted with the "body" calculation:

(the desired data table is nested inside 3 others)


ReduceSpaces(

  RemoveTags ( ExtractTagBody ( 

    ExtractTagBody ( 

       ExtractTagBody ( ExtractTagBody ( content; "table" ); "table" );

    "table" );

  "table" );

  "" )

)

ExtractTagBody, ReduceSpaces, and RemoveTags are custom functions I built, I also put in an ExtractTag function, and a support function of FindEndTag.

I notice you don't have Dev 7 <evil cackle>, but perhaps others might find it useful.

Sign In

importing records from html database?

Recommended Posts

alexandra_l

Fenton

alexandra_l

Fenton

alexandra_l

The Shadow

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information