Jump to content
Server Maintenance This Week. ×

Parse: Ignore Link in HTML Text??


tmas73

This topic is 6494 days old. Please don't post here. Open a new topic instead.

Recommended Posts

To all the parse geniuses! How is it possibe to pars e out a specific name if the previews link or the link after changes on a site? This seems unpossible to do! Here a example in html!

For example each href has a different ending for the name in the database! The in between the name filemaker should take over! "Johnny Depp - Jack Sparrow", Geoffrey Rush - Barbossa I was playing around and could not even get close! How would I even start to approach this? Should I count words, like find a costom note and count from there on? Also the notes inbetween the area are all the same! Ahhh.

Cast overview, first billed only:

Johnny Depp .... Jack SparrowGeoffrey Rush .... BarbossaOrlando Bloom .... Will TurnerKeira Knightley .... Elizabeth SwannJack Davenport .... NorringtonJonathan Pryce .... Governor Weatherby SwannLee Arenberg .... PintelMackenzie Crook .... RagettiDamian O'Hare<

TMAS

Link to comment
Share on other sites

It is not as hard as you think. There are unique characters in each of the 2 lines:

"/nm" in the Actor line

"valign="top">" in the Character line

You're right, the Character line would not be unique; unless you tried it AFTER the Actor line. Which you can, 'cause they're always in that pattern.

This is a quickly written Loop parser. It may not be optimized to heck, but I don't really care all that much. It seems fast enough. It basically goes through the html line by line.

The trickiest part is when to create a new record. I'm saying create one if the Actor field is empty.

(P.S. Since you only posted a clip of html I don't know how big the whole page is. If it is large, and this is a small part, you might want to grab just the relevant section. Don't know what/how that would be, as I can't see it. How are you getting the source HTML to start with? Duh, it's the Web Viewer, this being a Web Viewer forum. So why are you saying the post is Developer 7?)

Actor_Character.zip

Edited by Guest
Link to comment
Share on other sites

Thanks for your help attached you will find a test file I'm trying to get to work! My goal is to fill out the fields with the html source from IMDB and Amazon (or).

Click the IMDB and AMAZON button to load the text!

I did not build in the loop you provided in a earlier post!

Very dirty file just to figure things! =)

P.S. Sorry about the Developer 7. I use the FM 8.5 demo!

Thanks

Parse_Text2.zip

Edited by Guest
Link to comment
Share on other sites

Well, I guess that goes to show you what happens when you throw a few million dollars at web programmers. I couldn't even find anything to parse on those pages. What exactly are you trying to find?

Personally I think you'd be much much better off forgetting the Web Viewer for this task. It is primarily a visual and interactive tool. Amazon has an xml web service, which is not difficult to use, and which returns the data in well-organized xml. Which you can import directly into FileMaker.

The Amazon xml web service is not terribly difficult to use. But their documentation is, well, a mess. It's not that it's not there, it's just buried in all the various things Amazon can do, which is a lot, a whole lot.

There are have been a few examples of this. But you have to sign up for a free Developers ID to use it. I can post an example, but I'm not uploading a working copy with my Developer ID included, as that is against their terms.

Link to comment
Share on other sites

I went this route! I signed up and looked a t the sample on filemaker.com. I could not even find information about this! It is so confusing and specialized for hard core web developers that I thought parsing would be easier!

The HTML I gave you :P I got some fields to extract out of the html I have problems with linked names, like href links! Also some separation between title and year! Basicly the fields in the sb are the ones I'm trying to parse out!

Specially Crew and Cast

Title

Year

Sound

Director

Writer

Language

Plot

How would you go about the XML from amazon? I have the developer thing and went to web services but could not find easy explanation how to set the hole thing up or even find a tutorial. Frustrating!

Do you have any links?

Would it net be more independent and customizable in the future to go with parsing? At least I dont rely on Amazons or IMDBs Web changes!

Thanks

TMAS

Edited by Guest
Link to comment
Share on other sites

Yes, their documentation is hard to find.

Do a search here, for my user name and Amazon. You'll find several posts, some with examples.

For a quick test, paste the following into an AppleScript, in Script Editor:


set myID to "put your developer ID here"



set the_URL to quoted form of ("http://xml.amazon.com/onca/xml3?locale=us&t=webservices-20&dev-t=" & myID & "&KeywordSearch=FileMaker%208&mode=books&sort=+titlerank&offer=All&type=lite&page=1&f=xml")



set theText to do shell script "curl " & the_URL

That will get you the xml returned, to Script Editor (which is admittedly not much use; but it's a start, and shows how welll-organized xml data is).

It is possible to import that result directly into FileMaker, using XML Import (no AppleScript involved), using an XSL stylesheet. But, with that method the xsl must be on a remote volume, such as a personal web site.

Or, alternatively, you can use curl and write that resulting xml to a local file. That's just a tiny bit more code. Then you can use a local xsl stylesheet. Either way.*

I do not see that it is more dependable to parse source code for data. Their xml services data is not going to break, not for a long long time. Their presentation web pages are likely to change at any time, not to mention that the data is not even labelled, in any particular place or order.

*You could also parse the xml using text parsing. But that is way bass-ackwards, IMHO :P-|

Link to comment
Share on other sites

Ok thanks for your help! I will look into the Amazon and XML! Seems like a total new mountain to climb! The funny thing is I registered with Amazon Developer and WEb and I cant even post a question in the forum without ERROR.

:hair:

TMAS

Link to comment
Share on other sites

Hi Fenton! So I started to look into the XML thing from Amazon. They enjoy it to make it as complicated as possible for filemaker users! Basically there are 2 posts in their help from people complaining and nothing else! I think I give up on the xml idea. Thanks for your input though!

If I can ask you for one more favor. The actor fmp you posted here how would it be possible instead of creating new records for each actor to have them on one record! Have 2 fields and fill it in! Please see the attached file! I really try to learn it myself I just need some guidance, so I can analyze how pros do it! I really appriciate your time and patience!

Thanks

ActorCharacter.fp7.zip

Edited by Guest
Link to comment
Share on other sites

No, I wouldn't do it like that. It's terrible database design. Never do field1, field2, field3, etc.. If there's 2, and only 2 EVER (AddressLine1, AddressLine2 being one of the few examples), then OK. If there's 3, and only 3 EVER, well, perhaps, but kind of junky. Basically, if there's EVER more than 2, or if you don't know how many, then you should put them into another table. There is no real reason not to, and many many reasons why you should.

The idea that you'd want to put them in the same record implies that there is a "parent" they have in common. But you haven't mentioned it. The parent would go into a single record in its own table; these would be multiple children. You'd likely want to do it with a parent ID.

It's still much better to do it with xml, which is by its nature hierarchical organized data.

When I said do a search here, I meant here in FmForums. I've posted several examples about Amazon xml/xsl. There are different types of searches however (keyword, author, title, ASIN, etc.), each of which is slightly different.

[i would amend that to say that even Address1, Address2 is a kind of silly idea (I get these in files, and just leave it; can't be bothered to convert them). Why not just make the Address field 2 lines tall? I believe is a hold-over from old types of databases that cannot have 2 lines in that kind of field. Søren is right, bad relational design IS the root of all evil :)-]

Link to comment
Share on other sites

This topic is 6494 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.