December 29, 201411 yr http://www.episcopalchurch.org/parish/all-saints-episcopal-church-duncan-ok http://www.episcopalchurch.org/parish/all-saints-episcopal-church-briarcliff-manor-ny http://www.episcopalchurch.org/parish/all-saints-episcopal-church-greensboro-nc  I have included 3 sample urls containing information I am trying to scrape  I would like to open the URLS within filemakers' web browser and scrape them for this information I have included 3 samples because they vary. I would simply be happy to get the text between The title of the church and "see map: Google Maps"  Basically there are a lot of variables. Sometimes there are paragraphs of text in the middle of where I would like to select.  I am interested mostly in getting these fields populated if the web page has them  Name of Church Address City State zip Clergy Website Email Phone Facebook Twitter.  But I would be very happy to get the text scraped from the title to the "see map: Google Maps" then it would be a matter of parsing the individual fields. Thanks for help I included a graphic of what I want to copy on a sample page  Â
December 29, 201411 yr Here's something you could use as your starting point: Let ( [ text = GetLayoutObjectAttribute ( "yourWebViewer" ; "content" ) ; prefix = "<span class="locality">" ; suffix = "</span>" ; start = Position ( text ; prefix ; 1 ; 1 ) + Length ( prefix ) ; end = Position ( text ; suffix ; start ; 1 ) ] ; Middle ( text ; start ; end - start ) ) This extracts the City part of the address. You need to examine the page source in order to find "anchors" for each data item you want to extract. Edited December 30, 201411 yr by comment
December 30, 201411 yr Author I'm Lost When I try the formula you gave above I name my webviewer but the Prefix is highlighted and I get the error file not found. I need to make a trial file and see what I can do. But I haven't been able to begin. I will keep trying to work with your formula Thanks
December 30, 201411 yr Hi hownow, I'm Lost When I try the formula you gave above I name my webviewer but the Prefix is highlighted and I get the error file not found. I need to make a trial file and see what I can do. But I haven't been able to begin. I will keep trying to work with your formula Thanks You have discovered the Self taught method of learning. Keep in mind, you can always post a file that you are playing with so we can see first hand what you are doing. This really helps when we discuss your question using your layouts, field names, scripts, relationship graphs, etc. Good luck, Lee
December 30, 201411 yr Let ( [ text = GetLayoutObjectAttribute ( "yourWebViewer" ; "source" ) Simple copy/paste issue ... add a semi-colon at the end of this line. ... and corrected 'past' to 'paste' on mine, LOL.
December 30, 201411 yr Let ( [ text = GetLayoutObjectAttribute ( "yourWebViewer" ; "source" ) Another issue: substitute "source" with "content". Remember to give a name to your Web Viewer ( this example works if the web viewer name is "yourWebViewer" ). That calculation must be unstored. You could by-pass the web viewer approach using the script step: Insert From URL
December 30, 201411 yr Author I tried it and made a file to experiment which I am enclosing (parsing and scraping) I tried to just get it to enter the city with the script "copy and parse" but It doesn't send anything to the field "city" That is all I have and It doesn't seem to work. Thanks for starting me and all the corrections parsing and scraping.fmp12.zip
December 30, 201411 yr Change the script calculation with: Let ( [ text = GetLayoutObjectAttribute ( "webwindow" ; "content" ) ; prefix = "<span class="locality">" ; suffix = "</span>" ; start = Position ( text ; prefix ; 1 ; 1 ) + Length ( prefix ) ; end = Position ( text ; suffix ; start ; 1 ) ] ; Middle ( text ; start ; end - start ) )
December 30, 201411 yr Author Thank you I got the State from that. I just tried to do a <div class> for the address. but it pastes it with a big space before it so I am doing something wrong... I enclosed it with that modification. If you do the script you can see what it does. I can't seem to get the other areas . I am sure it is somehow the same principal but I don't know what I am missing I sort of have the Name (But somehow it adds a lot more stuff than I need and I don't know why, the address ( lot of white space) City is fine State is fine Country is fine so Here is the amended file. Please help me figure out what I have missing. I get the principles of the example. But I can't figure out the others. I need help with those. thanks 4parsing and scraping #4.fmp12.zip
December 31, 201411 yr Wich is the name that you want to extract from this HTML part: <title>All Saints' Episcopal Church, Greensboro, NC | Episcopal Church</title> To get the address change the calculation to: Let ( [ text = GetLayoutObjectAttribute ( "webwindow" ; "content" ) ; prefix ="<div class="street-address">" ; suffix = "</div>" ; start = Position ( text ; prefix ; 1 ; 1 ) + Length ( prefix ) ; end = Position ( text ; suffix ; start ; 1 ) ] ;Trim ( Substitute ( Middle ( text ; start ; end - start ) ; Char ( 10 ) ; "" ) ) )
December 31, 201411 yr Author Hi and Thanks so much I just want the name of the church , The problem is that sometimes there is a "'" apostrophe and that takes the form of" '" But it is not always there. I have the rest of the address so just the church name is important in the Name field I am placing the source code here to help discuss the other fields. the EMAIL if there is one and the Clergy and the Website fields are the most important. So I will include to codes here. When I try them they don't work. for the Website the code around it is: </div><div class="field field-type-text field-field-website"><div class="field-items"><div class="field-item odd"><div class="field-label-inline-first">Website: </div><a href="http://www.allsts.org/">http://www.allsts.org/</a> </div></div> for the Email it is <div class="field field-type-text field-field-email"><div class="field-items"><div class="field-item odd"><div class="field-label-inline-first">Email: </div><a href="mailto:[email protected]">[email protected]</a> </div></div> For the CLERGY it is <div class="field-label-inline-first">Clergy: </div>The Rev. Kurt Wiesner </div></div></div> I don't understand any of these when I try them. If I get these done I will be able to apply to others I thought the only way to webscrape was to get the text between to areas and then parse it in another field. But this way is so much more effective if I could just get it! So thanks for answering and all your help.
December 31, 201411 yr The problem is that sometimes there is a "'" apostrophe and that takes the form of" '" But it is not always there. Actually, it takes the form of ' and you can use the Substitute() function to replace it with an actual apostrophe (along with other HTML entities that you may find). I am not sure why you're having problems with the other items. For example, for CLERGY you could use: prefix = "Clergy: </div>" ; suffix = "</div>" ;
December 31, 201411 yr Author I am having problems with those others. I applied the prefix = "Clergy: </div>" ;suffix = "</div>" ; that you gave me and the trim function Trim ( Substitute ( Middle ( text ; start ; end - start ) ; Char ( 10 ) ; "" ) ) ) and that worked. If any code for fields are not there (missing) how is that handled? Is there any code to instruct filemaker not to capture if nothing is there? ( sometimes there is no clergy or website or email . Sometimes there is....
December 31, 201411 yr Author I am enclosing the amended file to show what I can get and what I can't get I cannot figure out Email and Web address Those are the most important information. I also don't know what to do to eliminate bringing in pure code into the field if that item is not present in the website. You can see it if you try some of the records. Some sites do not contain websites or clergy or email.. they seem to have all the other fields. I need to address that. 5parsing and scraping #5.fmp12.zip
December 31, 201411 yr Some sites do not contain websites or clergy or email.. they seem to have all the other fields. I need to address that. I'm wondering why you are using?Insert Calculated Result (Pastes the result of a calculation into the current field in the current record) instead of Set Field (Replaces the entire contents of the specified field in the current record with the result of a calculation) I can count on one hand since the release of version 7 that I have used Insert Calculated Result. The examples file doesn’t show any emails, Twitter or Facebook data. I would use the Hide object when if the empty fields are annoying to you. For the fields use IsEmpty (Self) and for the field names IsEmpty (Table::email) etc. HTH Lee
December 31, 201411 yr Author Thank you Lee for that insight , I didn't really know the difference. I need to add an example that has the facebook or twitter data. But that isn't so important to me. I had considered what you are saying about the ISEmpty function but my problem is still trying to get in the web address and the email address which is my point for doing this whole thing. But since the HTML is different , I haven't a clue about how to do it. Everything I have tried doesn't work. Thanks for your input. Much appreciated.
December 31, 201411 yr If any code for fields are not there (missing) how is that handled? Is there any code to instruct filemaker not to capture if nothing is there? ( sometimes there is no clergy or website or email . Sometimes there is.... I thought all these sites were using the same template. If some "fields" can be missing, then use the following pattern = Let ( [ text = GetLayoutObjectAttribute ( "yourWebViewer" ; "content" ) ; prefix = "<span class="locality">" ; suffix = "</span>" ; pos = Position ( text ; prefix ; 1 ; 1 ) ; start = pos + Length ( prefix ) ; end = Position ( text ; suffix ; start ; 1 ) ] ; Case ( pos ; Trim ( Substitute ( Middle ( text ; start ; end - start ) ; Char ( 10 ) ; "" ) ) ) ) This will return nothing when the source HTML doesn't contain the prefix.
December 31, 201411 yr Author Outstanding. That is a big load off my mind. I am going to get right on that. Thanks so much. That will rid all those massive code captures. How would I add the code Raybaudi gave me earlier as well because that took care of a long white space. This is what that was: Trim ( Substitute ( Middle ( text ; start ; end - start ) ; Char ( 10 ) ; "" ) )
December 31, 201411 yr Author Ah okay. thanks If you have any suggestions about getting the email or the website field . I WOULD BE SO HAPPY. I Don't know how to figure that out. I have parsed the name field to rid the apostrophe and everything else is working. Go ahead. Make my NEW YEARS EVE LOL Anyway Happy and Blessed New Year to you and thanks for all the Help This is a wonderful site.
January 1, 201511 yr Author Here is what I am using to try to scrape the source code to get the web address. Let ( [ text = GetLayoutObjectAttribute ( "webwindow" ; "content" ) ; prefix = "Website: </div> <a href=" ; suffix = "</a>" ; start = Position ( text ; prefix ; 1 ; 1 ) + Length ( prefix ) ; end = Position ( text ; suffix ; start ; 1 ) ] ; Trim ( Substitute ( Middle ( text ; start ; end - start ) ; Char ( 10 ) ; "" ) ) ) This gives me a huge amount of the source code and I don't know why I don't understand the difference in how to apply the tags that are different in HTML source. I can't find out where to find these answers. This is the same type of code for both the email and the web address. But inherently they are a href tags. But how do I differentiate them from the simpler ones?
January 2, 201511 yr prefix = "Website: </div> <a href=" ; The reason why this doesn't work for you is that an actual carriage return within a calculation formula is read as a space. If there is a carriage return in the original HTML, you need to write it as ¶, i.e. prefix = "Website: </div>¶<a href=" ; If the new line character is a linefeed rather than a carriage return, you will need to use: prefix = "Website: </div>" & Char (10) & "<a href=" ;
January 2, 201511 yr Author Hi and thank you but that didn't work. I have enclosed my trial file so you can see what it does. You can just use command 1 to activate the script. Website loads the whole page. It is only a little bit of help I need to finish this . If someone could just look over this file and see what happens when you use the command 1 script .... It gives a whole page of source and all I need is the website and the email and I just don't know how to do that one thing. Please help. Thank you 6parsing and scraping #6.fmp12.zip
January 7, 201510 yr Author I wish someone would help me. It is so easy for you and so hard for me. I am getting exhausted and learning nothing because it keeps failing . I asked if someone would check the file in my last post but since last week no one has downloaded it and no one has looked at it. Very discouraging. This should be a good example that teaches a lot of people important skills. I am just a novice asking questions from people with experience. OK?
January 7, 201510 yr You were instructed at message 16 to use set field instead of insert calculated result. You're still not doing that; which makes it harder to troubleshoot.
January 8, 201510 yr Author ok I did that so it can be figured out easier . It didnt seem to change anything. It is enclosed 7parsing and scraping #7.fmp12.zip
January 8, 201510 yr Take a look at this approach. You use variables to declare the text, prefix, and suffix, then calculate the result and set the field. It is easier to change and test the prefix and suffix values. Also, by storing the text in a field, you can examine it more easily. You don't have a simple problem to solve. 6parsing and scraping MOD.fmp12.zip
January 8, 201510 yr Author Thank you I looked it over (your modifications) (thank you) and I see the logic, as you pointed out, of making it easier to change and examine. I am sorry, as a novice, that I don't quite understand the principles of taking that to this next level for me. I tried to do the address and the website (as the way you did it) and failed at both( in your way of doing things) and I am not sure what I am doing wrong. If you would look at the changes I made you might be able to correct my understanding. I tried the address and the website. I have a lot of this kind of thing to do so it is vital I understand it. Thank you again Ps I added a bigger website content field so it would be easier to see. 6parsing and scraping MOD 2 Website email name address.fmp12.zip
January 8, 201510 yr You do have to work accurately. Â On the address, you omitted the first quote. And, it's a div not a span.
January 8, 201510 yr Even more accuracy problems.  The format of this method is:  set the prefix set the suffix get a calculated result set the target field to the result  Compare what you are doing with Zip; country; and website. ALSO: I strongly suggest you get FileMaker Pro Advanced; if you do not have it. (Or update your profile if you do have it) That aids enormously in troubleshooting.
January 8, 201510 yr Author Thank you Bruce. I do have advanced. I put in the quotation omission and it somehow gives me a whole page of developer code. I am sorry but I don't understand why this happens. In the file I am uploading now you can see that with any record by pressing command 1 for activating the script. But when I added the " mark where you said it was omitted it still brings all that page source into the address field. The email field is a disaster too. I am sorry I am so thick but I am trying hard. parsing and scraping MOD 3.fmp12.zip
January 8, 201510 yr Regarding website: There are four steps. In ZIPS you do step 1,2,3,4 In Country, you do step 2, 3 In website, you do step 1,2,4
January 8, 201510 yr Author I see the two images you sent and I corrected the span to a div and it still gives me the same result.
January 8, 201510 yr Thank you Bruce. I do have advanced. hownow, there has been a ton of time spent by others to help you. In order to get good help, you need to provide us with the most thorough picture of what is happening. This begins with your basic profile. There is a lot of differences between developing in the client version of FileMaker and being able to use the tools available in the Advance version. You need to update your profile to reflect your current FileMaker version. Use this link MY PROFILE
January 8, 201510 yr ***WHAT GIVES YOU THE SAME RESULT?*** You have at least three problems going on. Address; county; website. Note that there is no instance of "website" in the content.
January 8, 201510 yr Author I meant the address, wasn't working. I am only going to tackle one at a time. I upgraded my profile So I have only tried to do one thing at a time This file I changed the scan to div and included the " I am only trying to get the address field to work. 8mod scraping and parsing (address change.fmp12.zip
January 8, 201510 yr No; actually LOOK at the address field content. Click in the field. It IS working. It has leading spaces, tabs, etc. More attention to detail.
January 9, 201510 yr Hi I upgraded my profile Your Profile does NOT reflect a change? Perhaps you missed the button to save changes? or did I misunderstand your post.
January 9, 201510 yr Author Thank you for that. YOu are right. Why the spaces and all. I didn't see it. So I would have to make another script step to correct the extra space? I will do that I thank you. I will work on the others. I have an emergency I will post when I try the next field
January 9, 201510 yr Author I don't know. I was using a trim function the way we were doing it before. I am not knowledgeable of the reason there is so much white space. my next thought was to parse it somehow but to correct why it started that way --- I dont know.
January 9, 201510 yr Keep trying; keep LOOKING at your data and your process until you CAN explain it.
January 9, 201510 yr I haven't been following this thread. After 43 posts, I had given up and Bruce has been more than patient. However, try this on the address field: Trim ( LeftWords ( Table::address ; WordCount ( Table::address ) ) ) The thing is ... Trim() does not remove carriage returns nor does it remove other hidden characters. One caveat about using LeftWords() as I've presented is that it drops word-delimiter characters such as $, #, =, ¶. This dropping word-delimiters would only happen from the beginning of the field (in the case of LeftWords) or from the end (in the case of RightWords).
January 9, 201510 yr Author Thank you LaRetta That part worked but Bruce has me trying to do this another way. I am now trying to figure out the website and email. Which have been my most important requests to find out how to do.
January 9, 201510 yr No, I am not trying to have you do it another way. I am trying to get you to look at what you are doing, look at the data, perform accurate and complete work, and understand how this is working. Explaining things to others is a good way to develop an understanding. If you had been able to do that, you would have explained that the original data captured from the web site contains all these spaces, returns, etc.
Create an account or sign in to comment