Jump to content

extracting pieces of data from a text string


sgoethner

This topic is 7559 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi All,

A weakness that I have is with text functions. My situation that I'm stuggling with is phone company data that needs to be put into separate fields. The data includes field labels such as @TN for telephone # but it's one long text string. The first problem is getting only the data to the right of the field label. The second problem is that the fields in the text string can vary. Not all fields are always included (otherwise I would do a search & replace in Word on the field labels and change them all to tabs).

Sample data:

@SYC X @ACT I @CS B @TN 9145263333 @NME ALLEN; CARPET @HSE 3085 @STN E MAIN @LOST MOHGN LK @FUL ALLEN CARPET @CDN WESTCHESTER @ZIP 10547 @YPHC C5895 @YPH CARPET & RUG DLRS.-NEW

So, I need to extract the phone #, name, etc from this text string. I've been playing with Middlewords, position, etc. to no avail. Some of the important fields that could be part of a record and are not shown in the sample data are @LEV 1, @DTX.

Any insights are greatly appreciated.

Thanks

Steve

Link to comment
Share on other sites

If you have a global text field, gtext and your data field is string, then you can use the following function to determine the information, based on whatever you enter into gtext.

Left(Middle(" " & string & " @", Position(" " & string & " @", " " & gtext & " ", 1, 1) + Length(gtext) + 2, Position(" " & string & " @", " @", 1, PatternCount(Left(" " & string & " @", Position(" " & string & " @", " " & gtext & " ", 1, 1)), " @") + 2) - Position(" " & string & " @", " " & gtext & " ", 1, 1) - Length(gtext) - 2), 64000 * (0 or PatternCount(" " & string & " @", " " & gtext & " ")))

This could be pared down if you first set your string field to " " & string & " @" and your gtext field to " " & gtext & " ", in which case

Left(Middle(string, Position(string, gtext, 1, 1) + Length(gtext), Position(string, " @", 1, PatternCount(Left(string , Position(string, gtext, 1, 1)), " @") + 2) - Position(string, gtext, 1, 1) - Length(gtext)), 64000 * (0 or PatternCount(string, gtext)))

will work.

So, for example, if gtext contains '@TN' (' @TN ' for the second calc), then your result will be 9145263333.

Update: modified to ensure no result if there is no match for gtext in string and to account for possible partial matches.

Link to comment
Share on other sites

Hi Queue,

Thanks for the response. I'm looking at it, trying to figure out why it will work. I do want to Understand how to make it work and not just repeat it. A quick question that jumps out at me is the use of a global field. I will have about 170,000 records to process every 6 to 8 weeks. I know I can loop a script to copy each record into the global field but what happens to the previous calc?

I am in awe that what you replied back to me so quickly with makes sense to you. I'm frustrated because of my lack of skill with the text functions and because I think there should be an easy way to use them to search for the @fields and/or use the @fields to parse the data.

Let me know if my concern about the global field messing with the other records is correct.

Thanks

Steve

Link to comment
Share on other sites

In that case, you can either make all your fields calculations of the data field you're importing or you can hardcode them into a script.

If you choose the former option, use the first calculation and substitute each field name for every occurrence of gtext. If you opt for the latter, create a script similar to:

Go to Record/Request/Page [First]

Loop

Set Field ["syc", "Left(Middle(" " & string & " @", Position(" " & string & " @", " @syc ", 1, 1) + Length(" @syc "), Position(" " & string & " @", " @", 1, PatternCount(Left(" " & string & " @", Position(" " & string & " @", " @syc ", 1, 1)), " @") + 2) - Position(" " & string & " @", " @syc ", 1, 1) - Length(" @syc ")), 64000 * (0 or PatternCount(" " & string & " @", " @syc ")))"]

Set Field ["act", "Left(Middle(" " & string & " @", Position(" " & string & " @", " @act ", 1, 1) + Length(" @act "), Position(" " & string & " @", " @", 1, PatternCount(Left(" " & string & " @", Position(" " & string & " @", " @act ", 1, 1)), " @") + 2) - Position(" " & string & " @", " @act ", 1, 1) - Length(" @act ")), 64000 * (0 or PatternCount(" " & string & " @", " @act ")))"]

....

Go to Record/Request/Page [Exit after last, Next]

End Loop

You can simplify this by hardcoding the Length( ) portions, if you desire, e.g.

Length(" @syc ") + 2 can be changed to 6.

To understand the calculation, it may help to create temporary calculation fields that are portions of the whole calc. So, create a field to hold Position(" " & string & " @", " @syc ", 1, 1), one to hold PatternCount(Left(" " & string & " @", Position(" " & string & " @", " @syc ", 1, 1)), " @"), etc. All these really give you is a group of numbers that determine where you should start and stop, based on the information in the data sample.

In plain English, the 'syc' calculation does the following:

Take the middle portion from the first occurence of " @syc " in " string @" plus 6. This determines where to start, six characters after the start of " @syc ", which is where the 'X' appears in your sample. Now find the number of times " @" appears up to the position we already determined, and add 2. This is where the next data string begins. We then subtract from this the position already determined, and then further subtract the length of the search string, so that part of the next data string isn't included in the result. Now we know where to start and how far out to grab the data. Finally, the Left( ), 64000 * (0 or PatternCount(" " & string & " @", " @syc ")) tells the calculation to pull up to 64,000 charactes from the result, unless the search string does not exist in string, in which case we don't want it to return anything. The 'or' forces PatternCount( ) to act as a boolean, no matter how many occurrences of the search string there may be. So, if the search string doesn't exist, PatternCount( ) returns zero. 0 or 0 is 0, multiplied by 64000 is still 0, and no result is returned. If the search string does exist, then PatternCount( ) returns the number of occurrences. Each number is tested for truth, and since zero will never be true (boolean-wise) and PatternCount( ) is greater than zero, and therefore TRUE, it returns 1 (TRUE), which is multiplied by 64000 to extract all possible characters of the result.

If the Left( ), 64000....were not put into the calculation, then if the first Position( ) function returned zero (search string nonexistent), it would add 6 (in the case of 'syc') and start grabbing data from the sixth position, which would obviously be invalid.

I hope this helps explain the calculation more for you.

Link to comment
Share on other sites

This is a beautiful script/calc. Essentially, you are able to isolate a text string between two character clusters. I have a similar situation I'm puzzling over. I have a text field that holds multiple paragraphs. This field was designed for display on the web, so long ago we began carefully entering HTML italic tags where appropriate, as:

"Every student should have read <I>Moby Dick<I> by graduation."

Now I have a need to print some of this locally. So I need a calc field that will substitute out of existence the tag characters, and (using a version of your calc) isolate the text that needs to be italicized for printing.

Is there a calc function that will add italicization to the string?

Steve Brown

Link to comment
Share on other sites

God, I swear I simply cut and paste the adress with the Url thing grin.gif

It's not that known tip that can be found with Netscape wink.gif

Let me get you there with a few steps as I noticed a bug with cut/paste on Netscape.

So search for "Bob" within the elpased week, then find that answer where Vaughan said . "Very elegant Bob".

Some kind of puzzle you can rebuild for sure...If you find it, post back the url as I'm sure it might help others... wink.gif

Link to comment
Share on other sites

Not to mix up responses from two different Steves, on two different topics, in the same thread *too* badly, I still can't find the old thread where Bob Weaver came up with a geat solution, as per Ugo's attempts.

Bob Kundinger: I got your example, and it looks useful for several things. Unfortunately, embedding merge fields won't work for me, exactly. Thanks.

Steve Brown

Link to comment
Share on other sites

Got it, Ugo. Thanks. I saved Ray's demos & Bob's suggestion, along with the giant calculation from Queue in this thread. It looks like I'll have to spend an afternoon puzzling all this out & making the various bits work together.

Thanks all,

Steve Brown

Link to comment
Share on other sites

This topic is 7559 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.