alexandra_l Posted May 27, 2004 Posted May 27, 2004 hello everyone, i would like to import the records from this database website: http://improvdb.fr.st/ into my own fm pro database. can someone please point me in the right direction on how to do this? in the help files i've seen only html exporting mentioned. ideally the records of the website would be directly imported into my existing contact database. many thanks in advance alexandra
Fenton Posted May 27, 2004 Posted May 27, 2004 Basically you're going to have to get the data as text then "parse" into a form that can be set into your records. There is no such thing as direct HTML import; because HTML is just a bunch of text and markup tags. The only order to is whatever the author has used. Fortunately in most cases there will be a table which has the data you want, with uniform HTML tags in between the data. You didn't say what platform you're on. On Macs you can use AppleScript with the Safari browser to get the "text" directly: tell application "Safari" get text of document 1 end tell Simple, yes, but you generally end up with something the leaves much to be desired from a FileMaker viewpoint. An example from that site: California Add a venue / an organisation in California Venues / Organisations Type Cities Beanbender's Berkeley Jazz House, The Association Berkeley The data is there, but in a "return-delimited" format. You would run some tool to "munge" into tab-delimited; not too difficult in this case: double returns separate "records," single returns separate "fields." For a Windows or cross-platform solution you would undoubtedly have to get the entire source of the document, including HTML tags, and parse from there. The Troi URL plug-in would get the source text, http://www.troi.com/software/urlplugin.html "Gets the raw data of the specified URL. This can be for example the HTML of a web page" It would have a lot of extra stuff, but the data you want would be in a table with uniform separators between the cells. It's then a text parsing job.
alexandra_l Posted May 28, 2004 Author Posted May 28, 2004 fenton, thank you very much. i'm using os x panther. i was hoping there might be a solution that would treat a page like this one: http://www.lequanninh.net/improvdb/?fiche=ok&id_pays=28&id_lieu=299&tri=villes.ville_en&lang=en as something with field names and field data, so that the website's "field names" could be matched with the ones in my fm database. the simple applescript seems identical to doing "select all" and "copy" in safari, yes? i'm not too familiar with these things, and i don't know what my php/mysql is, as it says on the bottom of that page, but i was hoping a database website like this one could be more or less automatically imported into fm pro. alexandra
Fenton Posted May 28, 2004 Posted May 28, 2004 Yes, tell app "Safari" to get text of document 1 is pretty much the same as Select All, Copy. I wouldn't really know how to "automate" getting all this information. It seems to me pretty much a text-editing job. I would use a combination of (mostly) AppleScript and FileMaker text functions. In AppleScript I'd use the free Satimage OSAX (Scripting Addition) to enable "grep" search and replace. http://www.satimage.fr/software/en/downloads_osaxen.html You should be able to get the text of a single page, clean and arrange it to match a FileMaker file, and set its fields, all with one command. I can see one level of automation. Which would be to step through the "many" page, i.e., the page with 30 entries for California, get each of those individual entries, and get their detail pages. This would be slightly different than the text. In this case you'd get the "source" of the page, which has the links. Then parse through the table of links. Each one is only partial; it's missing its base reference: http://www.lequanninh.net/improvdb/ This is what's in the cells of the table: ?fiche=ok&id_pays=28&id_lieu=299&tri=villes.ville_en&lang=en Put them together and you've got the full link to go to the detail pages. http://www.lequanninh.net/improvdb/?fiche=ok&id_pays=28&id_lieu=299&tri=villes.ville_en&lang=en The source text is messier than the plain text. But the cell tags are consistent (though kind of clunky). If you get the whole table, then remove the tags, you'll have what you want. Yeah, it's a fair amount of work, and yes, it may need tweaking if they change the code. But I doubt if they will. It's pretty simple, as web pages go these days. And, yes, a web expert may have a better method. I'm more a brute force text-editing FileMaker AppleScript kind of guy. I've used this method to get a page of messages off this web forum into a FileMaker database as separate records, all fields intact, a 1-step process.
alexandra_l Posted May 31, 2004 Author Posted May 31, 2004 thanks again, fenton. i will give this a try.
The Shadow Posted May 31, 2004 Posted May 31, 2004 I've added some custom functions to help with this in my style-manipulation example: http://www.spf-15.com/fmExamples/ Change to the layout "webPage", and see your table's body extracted with the "body" calculation: (the desired data table is nested inside 3 others) ReduceSpaces( RemoveTags ( ExtractTagBody ( ExtractTagBody ( ExtractTagBody ( ExtractTagBody ( content; "table" ); "table" ); "table" ); "table" ); "" ) ) ExtractTagBody, ReduceSpaces, and RemoveTags are custom functions I built, I also put in an ExtractTag function, and a support function of FindEndTag. I notice you don't have Dev 7 <evil cackle>, but perhaps others might find it useful.
Recommended Posts
This topic is 7482 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now