Newbie tried to build an Amazon web scraper... here's what happened

Followers

January 14, 201511 yr

Hello,

I am trying to learn FM and at the same time build myself an ebook library tool that will scrape Amazon.com to pull data like title, author, rating, etc. -- based on the user (me!) submitting a book product ID -- which Amazon calls an "ASIN."

I have searched online for a similar solution or template, but have only found bits and pieces of a solution so I am trying to build one myself -- and hopefully learn FM in the process.

Here's what I have been able to accomplish so far. Don't laugh...

1) Create a field into which a user can enter an Amazon ASIN (like "B00FYW9VHC")

2) Use that data to generate an Amazon image URL (like ' alt='' class='ipsImage' >)'>' alt='' class='ipsImage' >)

3) Use that image URL and InsertFromURL to pull a book cover image from Amazon and put it into an image container (wow!)

4) Use the entered Amazon ASIN to generate an WebViewer object (named "amzn") which displays the Amazon product page (like http://www.amazon.com/exec/obidos/tg/detail/-/B00FYW9VHC)

Here's what I haven't been able to figure out:

1) How to pull the HTML source from the WebViewer object and put into into another field. I know this has something to do with GetLayoutObjectAttribute, but I haven't been able to figure out the correct script. Currently, I have something that looks like this:

 Set Field [LibraryTest::webcontent;GetLayoutObjectAttribute("amzn";"content")]

where webcontent is a text field and amzn is the name of my WebViewer. This clearly isn't working, though.

2) Once I am able to get the HTML source code into a field, I'm not sure how to parse it in order to grab data elements like Title, Author, Price, Description, Rating, etc.

If anyone has done this already or has some advice for me, I'd be very grateful.

Thanks in advance,

January 14, 201511 yr

This I do every day, I must admit I do not use FileMaker for the parsing part, I use Import Records as XML, and what I import is actually a script, the script produces FMPXMLRESULT, is very efficient and can run on the server.

The script that generates the XML I have built for different projects and environments in

Shellscript

PHP as pr blog: http://wethecomputerabusersamongst.blogspot.com/2013/10/execute-php-script-from-filemaker-with.html

NodeJS

PhantomJS

curl / tidy -asxml / XSLT

I also know people who made these parsers in Ruby and Perl

If you are to do the parsing inside a FileMaker field, you are probably to use position() middle() and make some offsets and take certain things for granted, this will be very fragile to changes on the Amazon output.

Fetching the HTML source does not work until after the progression bar is completed,

you may have to pause for 5 secs after changing that URL before you try to execute the GetLayoutObjectAttribute( "amzn"; "content" )

January 14, 201511 yr

Hi SB, and welcome to the FM Forums,

Scraping a Web Site is not as simple as it may seem.

See what another member is currently going through in his two thread, Link and Link

If you are under the impression that FileMaker is easy to learn and use, than you need to understand where that is coming from. For a simple thing like a Rolodex or Contact Database, this could be true. However, to develop a something more robust then you can look forward to a lot of time and effort on your part to learn the fundamentals and the more you want our of your solution, the more time you will be investing.

Before you start slapping together a file, you need to learn FileMaker and it’s way of doing things.

Start by studying the User Manual, Help Files, Starter files. You need to learn how to create fields, layouts, scripts, tables, and relationships, etc. and see how the work together. The Starter Files can help, so go under the hood of the files and find out how the tick. In layout mode, check out the Buttons, Popovers, tabs, etc.

Since this is a new solution that you are starting from scratch, you should prepare an ERD (Entity Relationship Digram) not to be confused with the Relationship Graph in FileMaker, and see how these help you determine the structure of all of these things.

Since you have identified your skill level as Beginner, (I’ll take that to mean new to databases design and FileMaker), so be prepared to spend a lot of time and effort learning both of these things. There are some excellent resources available, so let us know if you need some recommendations.

You might want to look at a commercial product I use called "Delicious Library 3” Link

Lee

January 14, 201511 yr

If you are under the impression that FileMaker is easy to learn and use, than you need to understand where that is coming from. For a simple thing like a Rolodex or Contact Database, this could be true. However, to develop a something more robust then you can look forward to a lot of time and effort on your part to learn the fundamentals and the more you want our of your solution, the more time you will be investing.

I'd say there are horses for courses, FileMaker has no real competitors in making reports, and it's very quick for making GUI.

But when it comes to parsing HTML? There are many options that I would prefer to use before native FileMaker; I would actually like to have something predigested into native FMPXMLRESULT, I often think the best solution is to pick the best tool from each toolbox.

If SB Books is only on Mac there is a full tool set of unix tools available

Example( just typed off the top of my head, not tested ) script: /usr/local/bin/fetchurlandmakefilemakersource.sh

---

cd /Library/WebServer/Documents/

curl $1

tidy -i -asxml -wrap 0 -m $1

xsltproc amzn2fmpxmlresult.xslt $1 > $1.fmpxmlresult.xml( this step is optional, can be applied in the import )

---

Should be able to call this script as follows: fetchurlandmakefilemakersource http://www.amazon.com/exec/obidos/tg/detail/-/B00FYW9VHC)

And import this using regular FileMaker XML import if websharing is turned on on the Mac.

All that has to be done is make a mapping; namely: amzn2fmpxmlresult.xslt

it should be a matter of one xpath SELECT pr tag you would like to map to a field.

January 14, 201511 yr

I understood your first post.

January 14, 201511 yr

I understood your first post.

:laugh:

January 14, 201511 yr

4) Use the entered Amazon ASIN to generate an WebViewer object (named "amzn") which displays the Amazon product page (like http://www.amazon.com/exec/obidos/tg/detail/-/B00FYW9VHC)

The proper way to do this is to use Amazon's web API, not scrape the web page. They will change their layout very frequently and that will break your scraping part. Use the developer tools that Amazon has and you will get the data in a standard format.

January 15, 201511 yr

The proper way to do this is to use Amazon's web API.

The same methods and links I already gave you will still be valid,

yet the task will be simpler with an official API at hand.

January 22, 201511 yr

Author

Thanks very much for the good feedback. I will dig into some of those resources, but UNIX, PHP, and setting up web servers is way beyond my capabilities.

Also as far as I can tell the official Amazon API will not return the current price of a Kindle book -- which makes it a non-starter for my ebook library tool.

Thnx again

8 months later...

October 8, 201510 yr

Using the approach I gave you in the link above you can basically

run these steps

1) Go to page of your liking using a tool that renders the page

2) screenshot, f ex both step 1 and 2 can be done like this( using phantomjs ): https://github.com/ariya/phantomjs/blob/master/examples/render_multi_url.js

3) OCR the screenshot, f ex using tesseract or OCRopus

4) Search for $, f ex using grep on the output

Edited October 8, 201510 yr by ggt667

October 8, 201510 yr

Author

That's a very interesting and unique approach, GGT. Thank you!

Create an account or sign in to comment

Followers

Go to topic listing

FileMaker Amazon Textract Integration
FileMaker Amazon Textract Integration

dbservices · September 16, 20205 yr
- amazon
- fmp19
- integration
- 0 replies
- 1,103 views
dbservices

September 16, 20205 yr
Interactive Javascript not working in IE on FM13 and 14
javascript

Interactive Javascript not working in IE on FM13 and 14

Macfreq · December 18, 201510 yr
- fmp13
- fmp14
- ie
- webviewer
- windows
- 0 replies
- 1,955 views
Macfreq

December 18, 201510 yr
Webviewer Not Loading Sites
pc

Webviewer Not Loading Sites

drschilling · January 4, 201511 yr
- fmp13
- webviewer
- 3 replies
- 4,279 views
hbrendel

January 4, 201511 yr
Thumbnail Generation Issues
Thumbnail Generation Issues

triconamy · May 6, 201412 yr
- fmp13
- image quality
- java 7 update 51
- low-resolution
- Tagged with:
  
  fmp13
  
  image quality
  
  java 7 update 51
  
  low-resolution
  
  supercontainer 2.896
  
  thumbnails
  
  unstored calcs
  
  vertical strips
  
  webviewer
- 0 replies
- 2,147 views
triconamy

May 6, 201412 yr
Webviewer on hosted db, access server files
Webviewer on hosted db, access server files

Dimitrios Fkiaras · October 9, 20232 yr
- server
- webviewer
- 0 replies
- 5,937 views
Dimitrios Fkiaras

October 9, 20232 yr

Newbie tried to build an Amazon web scraper... here's what happened

Featured Replies

Create an account or sign in to comment

Similar Content

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)

Create an account or sign in to comment

Similar Content

FileMaker Amazon Textract Integration

Interactive Javascript not working in IE on FM13 and 14

Webviewer Not Loading Sites

Thumbnail Generation Issues

Webviewer on hosted db, access server files

Important Information

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)