stuj1026 Posted July 13, 2006 Posted July 13, 2006 (edited) Hi All Have been playing with the web viewer and have visitied the fm website and it seems you can do something called web scraping. I bring up a page in the web viewer and in another field am able to retreive the source code for the web page being diplayed using the GetLayoutObjectAttribute ( "web object name" ; "content" ) Here is a small sample above of the html I now need to extract the profession of the doctor.. I can get the starting position of the profession using the position function but now nedd to extract everything that falls between the and the after the position of the profession. Any thoughts?? Really need some help on this one.. Thanks Stu Edited August 26, 2006 by Guest added code markup
Ballycroy Posted July 13, 2006 Posted July 13, 2006 (edited) How about this: Let ( [var1 = Position ( SourceText ; "[b]" ; 1 ; 1 ) ; var2 = Position ( SourceText ; "[/b]" ; 1 ; 1 ) ; var3 = var2-var1-3]; Middle ( SourceText ; var1+3 ; var3 )) Edited August 26, 2006 by Guest added code markup
stuj1026 Posted July 14, 2006 Author Posted July 14, 2006 (edited) Ok I got it!! I built a custom function called WebScrape Let ( [ text_to_find =Position ( Source_Text;Extract_Text; 1; 1); Extraction1=Middle (Source_Text ; text_to_find ;100000 ); Extraction2=Position(Extraction1 ;Beg_Tag;1;1 ); Extraction3=Position(Extraction1 ;End_Tag;1;1 ); LenBegTag=Length ( Beg_Tag ); LenEndTag=Length ( End_Tag ) ]; Middle ( Extraction1; Extraction2+LenBegTag; Extraction3-LenEndtag-Extraction2+1) ) ------------------------------------------- Source text which is the raw html Extract Text which in my case is profession BegTag which is [b] EndTag which is [/b] SO this will extract from the raw html the information which falls between and that follows The word Profession. Stu Edited August 26, 2006 by Guest added code markup
Newbies Philip Posted August 7, 2006 Newbies Posted August 7, 2006 I like that custom function. Would you give an example with the parameters put into it? for example, WebScrape(SourceField ; "Profession " ; "") That would be so helpful! Thanks.
MogensBrun Posted August 7, 2006 Posted August 7, 2006 I recently posted a HTMLtoText custom function at http://www.briandunning.com/filemaker-custom-functions/, which can convert a whole web page or larger part hereoff from HTML to plain text - with preservation of bold, italic and bullet formatting. A demo file with this custom function may be downloaded from this post. Bedst regards, Mogens Brun DemoHTMLtoText.fp7.zip
bruceR Posted August 26, 2006 Posted August 26, 2006 (edited) Very nice. Looks like you need to add to the list of substitutions: [ "�" ; "®" ] ; Edited August 26, 2006 by Guest
MogensBrun Posted September 6, 2006 Posted September 6, 2006 The [color:red]HTMLtoText custom function at http://www.briandunning.com/filemaker-custom-functions/ has been updated. You may use web viewer or Troi URL to fetch the page, you want to parse from HTML to text. A demo file can be downloaded from this post.
Søren Dyhr Posted October 18, 2006 Posted October 18, 2006 Hi Mogens couldn't you explain the reasoning behind the use of global fields, which is required by your CF?? --sd
Lee Smith Posted October 18, 2006 Posted October 18, 2006 I don't see a demo file at Brian's site, and the link to your site did not work. Lee
MogensBrun Posted October 24, 2006 Posted October 24, 2006 The [color:red]HTMLtoText custom function at Brian Dunning's site has been updated to 1.04. You may use web viewer, Troi URL or Fusion TCPdirect to capture the web page, and then parse the source HTML to formatted text. A demo file can be downloaded from this post. The custom function uses now three global fields. The reason for this is that: In FileMaker a text expression may either be (1) a constant - or "literal" - text string, (2) a field reference (either a normal or global field), or (3) a calculated combination of (1) and (2). Ad. (1) A constant is entered between quotes in a Set Field script step or in a Custom Function ... or similar. FileMaker's internal text editor will filter keyboard entered characters, so only a subset of the possible ASCII chars may be expressed. For example you can't enter a line feed (ASCII = 010). Ad. (2) Text in a field may be entered through keyboard (1), by import from other files or by import from a web viewer field. The two last methods make it possible for any ASCII value to occur in a field. ASCII-value NULL (0) seems to provoke a crash in some circumstances. Other ASCII-values can give other problems. These characters can't be entered in a constant/literal text string, but must be pasted into a field, if you want get rid of them through a substitution or similar. Dette er årsagen til at jeg er nødt til i HTMLtoText at benytte to globale felter til at repræsentere karakterværdier, der ikke kan indtastes mellem anførselstegn som literal tekst. Det drejer sig om ASCII = 010 og ASCII = 063, som jeg fandt ud af ofte gav problemer ved HTML parsing. Listen burde måske udvides med flere tegn, bl.a. ASCII = 000. There is an alternative metoh: Fusions TCPdirect plugin will allow you to express any char value. So by using this plug-in you can avoid to use globals for storing of special char values. Ad. (3) All above applies to this point. DemoHTMLtoText_1.04.zip
Lee Smith Posted April 20, 2007 Posted April 20, 2007 Hi Mogens, I finally got around to looking at your files. I have a project that I think the View will work great for. When I tested your files, I'm having trouble with the second one. For some reason your second file times out using Brian Dunning's site, and FMForum. However, your original file works as you wanted to. What am I missing on the second file? Lee
apathyisafad Posted July 26, 2010 Posted July 26, 2010 Thank you! Your custom function solved a huge problem for me!
Pushkraj Posted September 30, 2011 Posted September 30, 2011 Similar to this I need a code that can Parse the XML and show me the plain text in the web viewer. I work on XML files. I used XML to call some external APIs , which in return gives me the response in XML. I need that XML response to be shown as a plain text in a web viewer. Any help would be highly appreciated. Right now instead of parsing it to normal text, I have decided to show the XML itself as data in the web viewer. I have just tried to use the following code $XML_ResponseFinal = "<html> <body> " & ¶ & Substitute ( $XML_Response ; [ "<" ; "<" ] ; [ ">" ; ">" ] ) & ¶ & "</body> </html>" to try to show the normal XML (protected from html tag i.e to be treated as normal text and not xml) in web viewer but not successuful. The web viewer is just showing blank. I am using Set Web Viewer [Object Name: "webviewer"; URL:"data:text/html;"$XML_ResponseFinal] Thanks Pushkraj
beverly Posted October 4, 2011 Posted October 4, 2011 WARNING! danger Will Robinson! "scraping" can lead to problems if the site decides to change the format (and they do!). If you get XML content, then yes, import the XML. You may need an XSLT to get it transformed to FMPXMLRESULT (used for FM xml import).
Recommended Posts
This topic is 4810 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now