January 10, 201511 yr Author In each instance of this there is a beginning point and an ending point through which the text is captured/scraped. It is like a sandwich at a buffet of many things and i only want the contents of this sandwich. I identify the sandwich in very specific locations. For me it is like a foreign language I can't speak. I am trying to ask with correct grammar of which I have no understanding. I have been seeking the way to speak this to correctly get what I need. I have learned to speak the prefix of "<span> and the end of (suffix) <span> and I can get that content. I am now speaking blindly and rattling on to the best of my ability and through the help of others, I get to understand the grammar of <div> and <div> .... People at the buffet await my discovery of their words in their language but every command I try doesn't follow the same grammar and I blunder.... Because to get the name of the food I still have to understand the grammar of <title> But especially hard to speak is the mysterious references to website and email which as a novice i have tried to address but to no avail , the sandwich is more like a club sandwich with three slices of bread! I dont know how to use this grammar because the suffix and prefix vary and I do not know the rules. So when i ask wrongly i get spooned large amounts of wrong things .. I only know the principle but this language is beyond me. I only can repeat what others teach me. I am getting to where I am afraid to ask
January 10, 201511 yr That seems like a good start. Web scraping is notoriously difficult. It certainly helps to learn the tools (scripting and calculation etc) but you may be assuming that the sandwich is structured consistently. There is absolutely no guarantee of that and no guarantee that what you figure out today will work tomorrow.
January 10, 201511 yr Author Is it better then to capture the whole text and collect data from between two text strings .That was my original intention. My topic was web scraping getting text between two text strings. So if I have a word like "seconds" and another text string like "another way of saying" . In the below example. "He was merely seconds away from finding out how to get an email. Another way of saying he was never going to get his project done. How can I just capture "away from finding out how to get an email." that will give some way to do this. Which is the most reliable and useful way to go this...
January 10, 201511 yr Why are you asking this question when you already have the answer? Give up; or pay somebody to do the job.
January 10, 201511 yr It feels like you keep putting this back on us - that we do not answer you correctly. I do not like that at all. It is your responsibility not ours ... if you would study what we've given you in more depth, you can figure it out. Yes, it is difficult work and why we get paid the big bucks. I know FileMaker makes it look easy but it is not. However, it IS easy to replicate what we give you and learn from it. It does not appear that you are learning any of the text parsing techniques we are presenting. Having said that, the concept remains the same ... find the string you wish to parse, find its beginning, find its end and grab everything in between. Going by your file (and again, I do not have time to really delve into this), you have a large block of text in your email field. To find the website from it: Let ( [ field = Table::email ; begin = "<meta property="og:url" content="" ; start = Position ( field ; begin ; 1 ; 1 ) + Length ( begin ) ; end = Position ( field ; "/>" ; start ; 1 ) - 1 ] ; Middle ( field ; start ; end - start ) ) So find the data you want in the text, go backwards until you find it's opening tag, put the tag in the begin portion, then look for the first closing tag past that. I have done ZERO web scraping (just haven't had the need) but it still is just parsing text, from what I can gather. Now go back up to post #2 where Comment provided you EXACTLY the principle you needed to get this done.
January 10, 201511 yr BTW, the calculation I presented you in post #47 was because Trim() was not removing the beginning spaces and carriage return. Those were obviously not regular spaces but other hidden garbage (for lack of proper word). So by using Left() as I had, it ignored all the beginning invisible junk FM does not consider a word. But you didn't even ask me why it worked! I was showing you how to eliminate "all that white space at the beginning". Sometimes external text can hold garbage characters and you'll need to address that. Also, you said another time that you couldn't get out of the calculation dialog. You should have copy/pasted your calculation here for us to review instead of again throwing up your hands and telling us something we suggested didn't work. Over 50 posts here where it should have only taken 5-6 at most. You expect the perfect answer be given you and YOU must do the work to get there - not us.
January 10, 201511 yr Also note that if quoted text is inside a quote, as in this case, you must escape it out by beginning it with before the quote character. Again, if you get stuck, copy your calculation which is throwing an error, tell us exactly where it is highlighting the error and we can help you fix your calculation. That does NOT mean what we gave you was incorrect ... only that the specific text string you are using probably contains a character which must be escaped out.
January 10, 201511 yr Author Thank you for your answers ,all of you. I am sorry, after I retired 23 years ago, because of health issues, I became heavily medicated. I was a kind, award winning educator but my thought processes have faded. I cannot think as I used to. I never meant any hostility or disrespect to anyone. I have tried my best but I think it is time for me to stop trying as Bruce suggested. I am sorry I put all of you who are experts through such a waste of your time. I do, however, take exception with inferring I am not doing my part. I ask because I don't know. I worked very very hard to try to get this right. Sometimes I didn't know what I was looking for and I am glad people made me look harder. BUT I DID LOOK HARDER. I made 31 files (builds) and didn't get it. Just because you tell me something doesn't mean I am not trying hard. I do know how It can be frustrating. I had some slow kids but I never let them know that. One of them is now a very accomplished musician and I taught him to play his instrument. I am giving up on this now. I am too old, and sickly, I wanted to do something with the last part of my life. Keep going full strength to newcomers, be patient if your endeavors with others. I valued your thoughts and admired your advice but it is too late for me. I am just a novice and I need to bow out. Filemaker is too difficult. I thought there was a simple solution to get the text between two text strings within all the page source to extract the email and the web address. Sorry to all of you for your time and effort on my part. Of course I know your intentions are well intended. Good Bye and God Bless
January 10, 201511 yr Come on chap; don't give up. You're already doing stuff that's miles more advanced than what I'm trying to tackle! It sounds like you're getting frustrated with yourself for not understanding bits and for making silly typo / type mistakes. Laretta / Bruce etc will always help to the best of their ability, but sometimes their hands are tied when they don't have enough information. If I'm honest; I'd have no confidence in my own ability to find the text you're looking for, but that's because it's a difficult task, not because I'm an idiot (though I am a music teacher too.....) Best wishes, Mike
January 10, 201511 yr Don't worry about it, hownow. Take a break then come back and try the suggestions again. We wouldn't be hanging in there with you if we didn't care. I've quit this business a thousand times because of the same types of frustrations ... I think we all have. :-)
January 11, 201511 yr Author Thanks Mike and LaRetta and of course to all of you who have helped. I was very tired, I tried for over a week and got stuck. Most of all stuck in my own head. Sorry for that. I took a long nap LOL and feel refreshed. Every effort is appreciated. I have a better handle on my problem now. Again thanks -- I will be glad to continue to ask questions because I have many. Filemaker is such an enduring program and I am as excited as I can be about the new versions everytime like everyone. I want to remain part of this community. k So thanks everyone. I will get it sooner or later. I just need to pace myself.
January 11, 201511 yr Reading up in FM Help on these parsing functions will help also but here is a breakdown of the calculation I provided: Let ( [ field = Table::email ; begin = "<meta property="og:url" content="" ; start = Position ( field ; begin ; 1 ; 1 ) + Length ( begin ) ; end = Position ( field ; "/>" ; start ; 1 ) - 1 ] ; Middle ( field ; start ; end - start ) ) 'begin' string is the beginning tag string from its opening < clear through to the point of your text you wish to capture. 'start' uses the Position() function to 'count characters' to where the < starts so if you put this in the data viewer, it will produce the location of the <, with a result similar to 675, being the 675th character in that field. to find the character start of your text within, we must then add the Length() of that begin string so if the string is 25 in length, the NEW starting position is 675 + 25 (700) to get to the beginning of your desired text. 'end' finds the first /> immediately after the begin string so it might be 800. Then the calculation Middle() says to go to the start position 700 and grab everything for the next 'x' characters. We know 'x' because 800-700 means the text in the middle is 100 characters long. So Middle ( email field ; 700 ; 800 - 700 ) returns text between start and end. Reading Help on these functions, open your data viewer and create just the 'start' variable. In place of 'field', insert the field value and in place of 'begin', insert the quoted string. Then do same with Middle(). You can watch the results and adjust your calculations accordingly. What can trip you up is escaping in these types of strings. Quotes within quotes must be preceded with . And then, if invisible garbage remains in your resultant text, there are various techniques to strip it, an example was my quick-and-dirty method of using LeftWords() but you might use Substitute() or other text functions. When stuck in a calculation which won't let you out, copy it and paste it here within code (using the <> icon) and we can help you identify why it breaks. And welcome back.
January 11, 201511 yr By the way, please use Comment's calculation in post #2. I see he lists it more clearly for you than mine. You can simply change the prefix and suffix for each piece you need to extract. :-)
January 11, 201511 yr Author Thank you for all of that . That is going to be my Sunday reading. I am having trouble when I ask to look for a text string "> When I put it in quotes for a text constant it gives me an error message. I was trying "">" How can I write that to incorporate it into a calculation?
January 11, 201511 yr I was trying "">" Remember that quotes within quotes must be escaped. You escape by using a backslash. So it would be: "">"
January 12, 201511 yr Author Thank you Bruce. I will be studying what you have done. I need to understand more about variables. What LaRetta told me about the mark helped me tremendously. I had no idea on how to do that. Your work has shown me a lot to study further. When I really get it - If It is okay I will ask you some more questions to help me clarify. Thanks so much
January 12, 201511 yr Author For practice I tried to do a couple other items. I was able to make the phone work and the Fax work But using the model for Twitter I couldn't get Facebook and Diocese is a little different. Those are the only two I can't get But I was able to get those other 2
July 25, 20178 yr On 1/11/2015 at 10:57 AM, LaRetta said: Reading up in FM Help on these parsing functions will help also but here is a breakdown of the calculation I provided: Let ( [ field = Table::email ; begin = "<meta property="og:url" content="" ; start = Position ( field ; begin ; 1 ; 1 ) + Length ( begin ) ; end = Position ( field ; "/>" ; start ; 1 ) - 1 ] ; Middle ( field ; start ; end - start ) ) 'begin' string is the beginning tag string from its opening < clear through to the point of your text you wish to capture. 'start' uses the Position() function to 'count characters' to where the < starts so if you put this in the data viewer, it will produce the location of the <, with a result similar to 675, being the 675th character in that field. to find the character start of your text within, we must then add the Length() of that begin string so if the string is 25 in length, the NEW starting position is 675 + 25 (700) to get to the beginning of your desired text. 'end' finds the first /> immediately after the begin string so it might be 800. Then the calculation Middle() says to go to the start position 700 and grab everything for the next 'x' characters. We know 'x' because 800-700 means the text in the middle is 100 characters long. So Middle ( email field ; 700 ; 800 - 700 ) returns text between start and end. Reading Help on these functions, open your data viewer and create just the 'start' variable. In place of 'field', insert the field value and in place of 'begin', insert the quoted string. Then do same with Middle(). You can watch the results and adjust your calculations accordingly. What can trip you up is escaping in these types of strings. Quotes within quotes must be preceded with . And then, if invisible garbage remains in your resultant text, there are various techniques to strip it, an example was my quick-and-dirty method of using LeftWords() but you might use Substitute() or other text functions. When stuck in a calculation which won't let you out, copy it and paste it here within code (using the <> icon) and we can help you identify why it breaks. And welcome back. Hi LaRetta, Following your advice to look at the calculation suggested by Comment, I changed your Middle ( field ; start ; end - start ) to Trim ( Substitute ( Middle ( field ; start ; end - start ) ; Char ( 10 ) ; "" ) ) and was able to remove all beginning and trailing Carriage Returns from my text. Thank you and best regards, Daniel
Create an account or sign in to comment