Absolute beginner trying to parse text from the yellow pages online

mistery · October 17, 2012

Hello

I am posting a filemaker file that I made to show how I am trying to separate pasted data from the yellowpages online into separate field. The problem is that there are a lot of variations of text and that is why I can't figure this out. I am a complete newbie here and wanted to make a contribution to this forum by posting an example that I am trying to solve and hopefully people could help me make the calculation to make this happen. I have included the url address for each record to show where it came from and the pasted information from that page resides in the field called yellowpage copy,

I did the first record manually but the others I didn't have a clue about. I included a button to execute a calculation so there are probably many ways to do this. I have an instruction field to help us all understand the thinking that went behind the record to get it parsed. Please understand this task of parsing text from the yellowpages is daunting to a new person and I thought this would be a good example with a lot of variations for the good of all of us in the forum.

Thank you very much.

Yellow pages text parsing example.fmp12.zip

Lee Smith · October 17, 2012

I would probably use the Web Viewer to automate this, but you can also parse the text from "yellowpage copy" using a calculation (either in the field, or as a script step) such as.


// extract papagraph

Let ( [

n = 3 ;

t = "¶" & Yellow pages text parsing example::yellowpage copy & "¶" ;

Start = Position ( t ; "¶" ; 1 ; n ) + 1 ;

End = Position ( t ; "¶" ; 1 ; n + 1 )

] ;

Trim ( Middle ( t ; Start ; End - Start ) )

)

n = 3 is the position of the third paragraph You will need to change this to n = 4 (etc.) to do the next paragraph.

You will need to play with this to adjust for each paragraph.

I prefer to do this as a script step, as it is more forgiving. :)

mistery · October 17, 2012

Thanks for your answer but I don't see how it is possible to automate with a web viewer? I don't understand that at all. Thanks

Lee Smith · October 17, 2012

Actually, this is two different ways of approaching your need.

The calculation I posted can be used either in a field, or as a script to populate a field.

A second approach would be to use the Web Viewer to obtain the source information from a webpage.

There are pros and cons to this approach, one of cons would be a change to the webpage, and it will effect the results.

Sense you show yourself as a novice, I recommend that use the first approach.

mistery · October 18, 2012

I worked on some scripts and that was a good start. Now I have had some problems with record three. I have written my questions there. but they are.

I have a script to get the business and the address but the city and state and zip are linked together. I was able to extract the zip and state but it still remains in the city . Don't know how to get rid of the state and zip in the field city....

2. In Ithaca, NY > Mason Supplies & Materials > Dolph Buzz quarry I don't know how to extract the "mason supplies & materials that are between 2 ">" symbols and send that to the category field.

3. IF it says local the Phone number follows. How do I get the 13 characters after the word "local"

4.If it says "Visit" the next paragraph is the website. It is always the next paragraph after the word "visit" I don't know how to configure this.

5. The same will hold true for the word "email" the next paragraph after that is the email address.

I have enclosed the amended file.

Thanks

Yellow pages text parsing example.fmp12 2.zip

mistery · October 18, 2012

Hi I can't get two things.

1. How do I get the text between 2 words when I know the before and after words?

2, how do I get the next line after a specific word. Ie. the word "visit ".

Y

Thanks

mistery · October 19, 2012

sorry asked twice

LaRetta · October 19, 2012

Hi Mistery :laugh2:

Well, take a look at this idea (attached). It will not work in your first example where you have the same symbol without modification allowing User to specify second occurrence of the same character (it wouldn't be that difficult to modify though). Also, without recursion, it will not replace multiple occurrences of those strings within the same text field.

You can also use xWords but if you have symbols you wanted to keep on either side, you would lose them. I suppose one could use Substitute() but ... well this is what comes to me tonight, LOL.

contains both v7 and v12

GetTextBetween.zip

LaRetta · October 19, 2012

Wow. I had just taken a look and see another very recent post. Really you should follow the same thread so for future please stick it out there. If something is unclear just ask and if you don't get response, just BUMP it ( post again so it jumps to everyone's attention). :-)

And if you need the second or different occurrence of either side let me know.

mistery · October 19, 2012

Thank you LaRetta

I was able to get it done with the same symbol using substitute function and changing it into paragraphs.. Thanks for alerting me to my second posting attempt. Your answer really helped me...

Lee Smith · October 19, 2012

Topics have now been merged

Steve E. · October 20, 2012

Mistery:

As an "entry level" FMer, you might want to spend some time just playing around with the "If", "Case", and various "Left", "Middle", and "Right" functions; and the "If" and "Else If" script steps so you get a feel for these. If you already have, ignore this message. Im on FM 11 and can't open your file.

mistery · October 20, 2012

How do I get the very next line after a word that I know? In the file if the word "email" shows up it is always followed by a colon

Like this

EMAIL:

[email protected]

There is always a new paragraph and it is just the next line.

Does someone know how to do that? Thanks

Lee Smith · October 21, 2012

Having a prefix will help you for this one. However, your description does not match what is in the field.

EMAIL:

[email protected]

EMAIL: is really Email: (accuracy is a must when you are identifying key information like this..

If there isn't a return after the email, it will also break.

Try this calculation.


Let ( [

    text = Yellow pages text parsing example::yellowpage copy ;

    prefix = "Email:¶" ;

    suffic = ¶ ;

    start = Position ( Text ; prefix ; 1 ; 1 ) + Length ( prefix ) ;

    end = Position ( Text ; suffic ; start ; 1 )

] ;

    Middle ( Text ; start ; end - start )

)

mistery · October 21, 2012

Thanks Lee

Somehow it didn't work. I have a url with the "Email:"

http://www.yellowbook.com/profile/american-arborist-corp_1635550376.html?classId=0

I don't know what's wrong?

Lee Smith · October 21, 2012

I tested on your 2nd file, has it changed?

mistery · October 21, 2012

Lee I tried your calc in the email script with record 4 and it didn't work. I am sending it

Yellow pages text parsing example 2.fmp12.zip

Lee Smith · October 21, 2012

That is because there isn't a paragraph return following it.

EMAIL: is really Email: (accuracy is a must when you are identifying key information like this..

If there isn't a return after the email, it will also break.

mistery · October 21, 2012

Lee

I just checked my file. There is a return after Email: in record 4 . Just to test it for sure I copied it and pasted in back with a return from my keyboard. When I copied it into word it says there is a paragraph return. I screen captured it to show you.

Lee Smith · October 21, 2012

I don't use word. I use TextWrangler because of it's tools. In TextWrangler, none of the records that have either an email or visit have a return after the address.

mistery · October 22, 2012

I don't have text wrangler but what should I do to make it work?

Lee Smith · October 22, 2012

Have you added a return and see if that makes it work.

Lee Smith · October 22, 2012

I don't have text wrangler

Go here and download it for free.

mistery · October 22, 2012

Ithaca, NY > Mason Supplies & Materials > Ithaca Stove Works

Ithaca Stove Works

414 N Meadow St Ste A

Ithaca, NY 14850-3247

Local: (607) 272-2650

0

Be the first to review

Visit:

www.ithacastoveworks.com

Email:

[email protected]

// That is what my record 4 copies

This is the script i tried with the return

Let ( [

text = Yellow pages text parsing example::yellowpage copy ;

prefix = "Email:¶" ;

suffic = ¶ ;

start = Position ( Text ; prefix ; 1 ; 1 ) + Length ( prefix ) ;

end = Position ( Text ; suffic ; start ; 1 )

] ;

Middle ( Text ; start ; end - start )

)

But it returns nothing

Lee Smith · October 22, 2012

Here is the file with two scripts, one for the email and the other for the URL.

Again, you must add a return to the end of them when one is missing.

Yellow pages text parsing example 2.fmp12.zip

mistery · October 22, 2012

Thank you Lee!

I think it works fine but I discovered the problem.

I tried using the scripts with examples copied from the internet as they were.

When I copied the addresses as they were they didn't have a return after the address ONLY in the case of Visit when followed by an email. Then the Visit naturally had a return after their web address because it was followed by "Email". But unless I add the return to the new examples the scripts don't work.

I changed the script names in the example I am sending you

from set email to email and set visit to website

I put in notes to show what I am getting at in the notes field.

The problem is that natively there is no return after "Visit" unless followed by an email

and if there is an email and a web address it has no return after the email in the form that I can copy.

not sure what to do about it.

yellow pages with and without.fmp12.zip

Lee Smith · October 24, 2012

Hi mystery,

My computer is at the repair shop (cashed on Sunday) so I'm unable to look at your file. I'm hoping someone else will jump in and help in my Adsense.

Lee

mistery · October 24, 2012

Oh Im very sorry to hear that Lee. I will wait for you . I hope it is not a big expense and it gets back to normal. Good Luck with it.

mistery · November 5, 2012

I would like to know more about getting the web viewer to parse the addresses if I can.

Sign In

Absolute beginner trying to parse text from the yellow pages online

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Important Information