extracting pieces of data from a text string

sgoethner · November 5, 2003

Hi All,

A weakness that I have is with text functions. My situation that I'm stuggling with is phone company data that needs to be put into separate fields. The data includes field labels such as @TN for telephone # but it's one long text string. The first problem is getting only the data to the right of the field label. The second problem is that the fields in the text string can vary. Not all fields are always included (otherwise I would do a search & replace in Word on the field labels and change them all to tabs).

Sample data:

@SYC X @ACT I @CS B @TN 9145263333 @NME ALLEN; CARPET @HSE 3085 @STN E MAIN @LOST MOHGN LK @FUL ALLEN CARPET @CDN WESTCHESTER @ZIP 10547 @YPHC C5895 @YPH CARPET & RUG DLRS.-NEW

So, I need to extract the phone #, name, etc from this text string. I've been playing with Middlewords, position, etc. to no avail. Some of the important fields that could be part of a record and are not shown in the sample data are @LEV 1, @DTX.

Any insights are greatly appreciated.

Thanks

Steve

-Queue- · November 5, 2003

If you have a global text field, gtext and your data field is string, then you can use the following function to determine the information, based on whatever you enter into gtext.

Left(Middle(" " & string & " @", Position(" " & string & " @", " " & gtext & " ", 1, 1) + Length(gtext) + 2, Position(" " & string & " @", " @", 1, PatternCount(Left(" " & string & " @", Position(" " & string & " @", " " & gtext & " ", 1, 1)), " @") + 2) - Position(" " & string & " @", " " & gtext & " ", 1, 1) - Length(gtext) - 2), 64000 * (0 or PatternCount(" " & string & " @", " " & gtext & " ")))

This could be pared down if you first set your string field to " " & string & " @" and your gtext field to " " & gtext & " ", in which case

Left(Middle(string, Position(string, gtext, 1, 1) + Length(gtext), Position(string, " @", 1, PatternCount(Left(string , Position(string, gtext, 1, 1)), " @") + 2) - Position(string, gtext, 1, 1) - Length(gtext)), 64000 * (0 or PatternCount(string, gtext)))

will work.

So, for example, if gtext contains '@TN' (' @TN ' for the second calc), then your result will be 9145263333.

Update: modified to ensure no result if there is no match for gtext in string and to account for possible partial matches.

sgoethner · November 6, 2003

Hi Queue,

Thanks for the response. I'm looking at it, trying to figure out why it will work. I do want to Understand how to make it work and not just repeat it. A quick question that jumps out at me is the use of a global field. I will have about 170,000 records to process every 6 to 8 weeks. I know I can loop a script to copy each record into the global field but what happens to the previous calc?

I am in awe that what you replied back to me so quickly with makes sense to you. I'm frustrated because of my lack of skill with the text functions and because I think there should be an easy way to use them to search for the @fields and/or use the @fields to parse the data.

Let me know if my concern about the global field messing with the other records is correct.

Thanks

Steve

-Queue- · November 7, 2003

In that case, you can either make all your fields calculations of the data field you're importing or you can hardcode them into a script.

If you choose the former option, use the first calculation and substitute each field name for every occurrence of gtext. If you opt for the latter, create a script similar to:

Go to Record/Request/Page [First]

Loop

Set Field ["syc", "Left(Middle(" " & string & " @", Position(" " & string & " @", " @syc ", 1, 1) + Length(" @syc "), Position(" " & string & " @", " @", 1, PatternCount(Left(" " & string & " @", Position(" " & string & " @", " @syc ", 1, 1)), " @") + 2) - Position(" " & string & " @", " @syc ", 1, 1) - Length(" @syc ")), 64000 * (0 or PatternCount(" " & string & " @", " @syc ")))"]

Set Field ["act", "Left(Middle(" " & string & " @", Position(" " & string & " @", " @act ", 1, 1) + Length(" @act "), Position(" " & string & " @", " @", 1, PatternCount(Left(" " & string & " @", Position(" " & string & " @", " @act ", 1, 1)), " @") + 2) - Position(" " & string & " @", " @act ", 1, 1) - Length(" @act ")), 64000 * (0 or PatternCount(" " & string & " @", " @act ")))"]

....

Go to Record/Request/Page [Exit after last, Next]

End Loop

You can simplify this by hardcoding the Length( ) portions, if you desire, e.g.

Length(" @syc ") + 2 can be changed to 6.

To understand the calculation, it may help to create temporary calculation fields that are portions of the whole calc. So, create a field to hold Position(" " & string & " @", " @syc ", 1, 1), one to hold PatternCount(Left(" " & string & " @", Position(" " & string & " @", " @syc ", 1, 1)), " @"), etc. All these really give you is a group of numbers that determine where you should start and stop, based on the information in the data sample.

In plain English, the 'syc' calculation does the following:

Take the middle portion from the first occurence of " @syc " in " string @" plus 6. This determines where to start, six characters after the start of " @syc ", which is where the 'X' appears in your sample. Now find the number of times " @" appears up to the position we already determined, and add 2. This is where the next data string begins. We then subtract from this the position already determined, and then further subtract the length of the search string, so that part of the next data string isn't included in the result. Now we know where to start and how far out to grab the data. Finally, the Left( ), 64000 * (0 or PatternCount(" " & string & " @", " @syc ")) tells the calculation to pull up to 64,000 charactes from the result, unless the search string does not exist in string, in which case we don't want it to return anything. The 'or' forces PatternCount( ) to act as a boolean, no matter how many occurrences of the search string there may be. So, if the search string doesn't exist, PatternCount( ) returns zero. 0 or 0 is 0, multiplied by 64000 is still 0, and no result is returned. If the search string does exist, then PatternCount( ) returns the number of occurrences. Each number is tested for truth, and since zero will never be true (boolean-wise) and PatternCount( ) is greater than zero, and therefore TRUE, it returns 1 (TRUE), which is multiplied by 64000 to extract all possible characters of the result.

If the Left( ), 64000....were not put into the calculation, then if the first Position( ) function returned zero (search string nonexistent), it would add 6 (in the case of 'syc') and start grabbing data from the sixth position, which would obviously be invalid.

I hope this helps explain the calculation more for you.

spb · November 9, 2003

This is a beautiful script/calc. Essentially, you are able to isolate a text string between two character clusters. I have a similar situation I'm puzzling over. I have a text field that holds multiple paragraphs. This field was designed for display on the web, so long ago we began carefully entering HTML italic tags where appropriate, as:

"Every student should have read <I>Moby Dick<I> by graduation."

Now I have a need to print some of this locally. So I need a calc field that will substitute out of existence the tag characters, and (using a version of your calc) isolate the text that needs to be italicized for printing.

Is there a calc function that will add italicization to the string?

Steve Brown

Ugo DI LUCA · November 9, 2003

Ciao Steve,

You can italize or whatever by a combination of calculations and merge fields.

This was lately described some days ago here.

a very elegant by Bob Weiver

spb · November 10, 2003

Could you give me a key phrase in the thread's subject header, or Bob Weiver's login name to search for? When I click on the link you provided, all I get is a popup ad for a paid survey site.

thanks,

Steve Brown

Lee Smith · November 10, 2003

Me too.

I think Ugo is getting paid for hits on this site.

http://www.paysforsurveys.com/

Kundinger · November 10, 2003

Hi Steve,

Here is a link to a post I did a year ago about styled text embedded within text...

http://www.fmforums.com/threads/showflat.php?Cat=&Board=files&Number=46863

There is a simple sample file attached for you to play with.

I hope this helps... Good Luck!

Bob Kundinger

[email protected]

Ugo DI LUCA · November 10, 2003

God, I swear I simply cut and paste the adress with the Url thing

It's not that known tip that can be found with Netscape

Let me get you there with a few steps as I noticed a bug with cut/paste on Netscape.

So search for "Bob" within the elpased week, then find that answer where Vaughan said . "Very elegant Bob".

Some kind of puzzle you can rebuild for sure...If you find it, post back the url as I'm sure it might help others...

sgoethner · November 11, 2003

Thanks Queue!

I see the light at the end of the tunnel and it's not a train.

I'm beginning to understand how to use these functions to generate a position # and then use that to find what I need. I really appreciate your help. Thanks

Steve

spb · November 11, 2003

Not to mix up responses from two different Steves, on two different topics, in the same thread *too* badly, I still can't find the old thread where Bob Weaver came up with a geat solution, as per Ugo's attempts.

Bob Kundinger: I got your example, and it looks useful for several things. Unfortunately, embedding merge fields won't work for me, exactly. Thanks.

Steve Brown

Ugo DI LUCA · November 11, 2003

HERE

Sorry Steve.

You should get either a link to one of Ray's demo or Bob's suggestion.

spb · November 12, 2003

Got it, Ugo. Thanks. I saved Ray's demos & Bob's suggestion, along with the giant calculation from Queue in this thread. It looks like I'll have to spend an afternoon puzzling all this out & making the various bits work together.

Thanks all,

Steve Brown

Sign In

extracting pieces of data from a text string

Recommended Posts

sgoethner

-Queue-

sgoethner

-Queue-

spb

Ugo DI LUCA

spb

Lee Smith

Kundinger

Ugo DI LUCA

sgoethner

spb

Ugo DI LUCA

spb

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information