Extract text to records

vierdewereld · September 19, 2003

Perhaps it's easy but I 'm not able to solve this one.

A chunk of text is in the global field "G_Source".

The global field "G_split" defines the start/end position of a new record. (a word within G_Source that is repeated)

"T_text" for placing the extracted text on a record.

[color:"blue"]Loop

1) Create a record and fill T_text with the text from the start of G_Source to the first G_split.

2) Create records for every time G_split occurs in G_Source. Fill T_text on each record with the text between the G_Split, and the next...

3) Create a record and fill T_text with the text from the last G_split to the end of G_Source.

[color:"blue"]end loop

Please help me to set this loop in action.

Fenton · September 19, 2003

There are two ways to do this. One is to use a global counter, to increment the "occurrence" of the separator, in a Position function. Works fine, once you get it going. But there's an easier way (for the rest of us :-)

I set the original into another global, which can then be parsed w/out losing the original (not that I ever make mistakes).

Then, after pulling out the 1st chunk, using Left, you can remove that chunk, using Right. Then repeat.

TextExtractLoop.zip

Ugo DI LUCA · September 19, 2003

Fenton,

I set the original into another global, which can then be parsed w/out losing the original .

Nice one....but why would you loose the original ?

Lee Smith · September 19, 2003

Murphy's Law I believe.

Ugo DI LUCA · September 19, 2003

Lee

Didn't came out in France yet....

Was Fenton on an unlucky day, or is he the lovely magician

Lee Smith · September 19, 2003

Hi Ugo,

Go here for a little bit of enlightenment on Murphy's Law

http://www.murphys-laws.com/

Ugo DI LUCA · September 20, 2003

Thanks Lee

I was focused on the movie.

I tested the counter loop and it ain't fail.

Re-reading Fenton's post, I now understand better. "If anything can go wrong, it will", so better trap it into an additional global.

Jim McKee · September 20, 2003

Hi vierdewereld ...

Here's another way of doing it that can also handle situations where the separator word might also reside within a word in the source field. For example: "farm" is the separator word, and "farmboy" is a word in the source field. The script will ignore any instances of the characters comprising the separator word if they're not a whole word.

Fenton: I really liked your method of successively removing each text chunk from the global. A very kewl thinking-outside-the-box kind of approach.

Lee: thanks for the Murphy's Law pointer ... very funny

Extract Text to Records_3.zip

BobWeaver · September 20, 2003

He puts the text into another global because as he parses the text out of it, the leftmost chunk gets deleted. If you don't need the original text after you're done, then it doesn't matter. But, you might want to keep a copy of the original text for future reference.

Jim, you beat me to it.

Ugo DI LUCA · September 20, 2003

Hi Bob,

OK, he needs it because of the method he implemented...

Here's another one, just for fun. I haven't downloaded Jim's sampler, so I hope it won't be redundant. But for sure you'll appreciate this one, if my memory didn't fail

As Verdewereld said, "Fill T_text on each record with the text between the G_Split", and ...fill T_text with the text from the last G_split to the end of G_Source.", I considered the separator wouldn't be included in the T_Text.

Doesn't make a lot of difference, we may adapt the calc for this too.

ExtractEmo_s.fp5.zip.3a58efcbfc76c55948faec184ced4188.zip

Jim McKee · September 21, 2003

Hi Ugo ...

I liked your solution

Fenton's demo gave the option to include or not include the separator word in the text field of the created records.

Among other changes, I tweaked my demo to do the same thing, via a checkbox, and re-uploaded it to my original message.

Cheers!

Ugo DI LUCA · September 21, 2003

Jim,

Glad you found it "pleasant".

I finally downloaded yours. What you tried to acomplish is harsh, as you are facing FM limits in its interpretation of words.

In my first attempt to answer this post, I implemented Ray's tip to extract a value from a Value List, rather than using the 'Position' functions. Adding a " " to the G_Source after any occurance of G_Split, and substituting any previous " " , it should have been easy to use the counter to extract just the string needed (including or not G_Split in the result), using a nested substitution calc along the lines of :

Substitute(Substitute(MiddleWords(Substitute(Substitute(Substitute(G_Source, " ",".Par."),G_Split, G_Split & " "), " ", ".spc."), g_counter, 1), ".spc.", " "),".Par.", ".Par.")

I was quite on my way to post it when I realized I had forgotten that this would turn to be incorrect if there were non-alphanumeric characters in the G_Source.

At least 30 nested substitution functions would have been needed to parse any non-alphanumeric characters....

That's why I implemented the "classic" Middle/Position function here.

But, I'm afraid you'd need to add some more parameters too, to make it work effectively.

Identifying those g_Split that have a space before or after wouldn't be sufficient to determine that it is a plain word, as any "," or "." or ";" set close to the "farm" would change the result of your list.

This could become even more problematical if the string you were looking for was "Week-End" as this string isn't considered as a unique word.

The way FM interpret non-alphanumeric characters (see FMI TechInfo Article makes it really difficult, and you'd need to add some Substitute() functions to make it work correctly.

The big deal here is to determine which and how many non-alpha are actually used.

Ciao.

Jim McKee · September 21, 2003

Ugo ...

Thanks for your really helpful comments, and for your reference to FMI's Knowledgebase article on the WordCount function.

It can be difficult to answer posters' questions about how to accomplish something when we don't have all of the information. In this particular case, none of us knows the exact format of the data that vierdewereld needs to parse. So, we make assumptions, and we present "what if" solutions that the poster can adapt to his/her specific requirements.

For example, I assumed that the separator is a whole word with leading/trailing spaces, and that characters comprising the separator word might be contained within other words in g_source. Of course, vierdewereld's set-up may be completely different than this. It's impossible to write a solution that covers every contingency unless we know more about the parameters of the poster's situation.

One thing I am certain about: when I joined the Caf

Ugo DI LUCA · September 21, 2003

Yeah!

Agreed about the assumptions

I also made a lot of assumptions about his needs

Now I assume Vierdewereld posted this before a long week-end, and will come back later, and tells us some more hiden secrets.

As these Forums usually provide quick answers, it could really be helpful if posters were following the thread they generated

Jim McKee · September 22, 2003

Ugo DI LUCA said:
Now I assume Vierdewereld posted this before a long week-end, and will come back later, and tells us some more hiden secrets.

As these Forums usually provide quick answers, it could really be helpful if posters were following the thread they generated

Ugo ...

You are so right.

This is not a complaint, but a request:

I know it is difficult sometimes, especially for newcomers to FileMaker, to use the terminology of the development environment when formulating their questions. But, I wish posters would spend a bit more time communicating the details of their setup, and give fuller descriptions of the results they want to achieve. Frequently it's more difficult for me to understand the questions than it is to generate solutions.

From now on, I think I'm gonna lay back a bit and wait until I know more about what the user is trying to accomplish before I begin grinding away at solutions. Sometimes my assumptions about what they are dealing with and what they need are way off the mark.

vierdewereld · September 23, 2003

Whooohaa... fantastic reactions and solutions.

Many thanks all!!

I will give more detail about the environment this solution will be implemented.

But I think it is only good that many different options get discussed here so that not only my scenario gets explained. That's why I keep my questions as short as possible.

The solution will be used to extract Email text. Right now I developed a method of extracting text from an Email. By giving a starting sentence and end sentence, what is in between will be set to the selected field. I have this option for about 20 fields and if the end word field is empty it will use the end of that line automatically. A text calculation field immediatly shows the first occurence of what will be sent to a field if the start field is filled.

As soon as it got finished I noticed it needed to be able to created separate records for parts for the text. I also created a general start and end to select a portion of the G_source and leaving the need of the G_split to you guys. Thanks again for helping me complete this puzzle.

In my case it doesn't really matter if G_split is included, so I actually didn't even think about that possibility, good one tho.

I prefer being able to put a whole sentence in G_split other then just a word or a series of letters. As soon as you typed in this sentence, with the use of a calculation (next to the G_split field) it will show how many records will be created and thus if it will work.

Hope this explains more my situation. I will have a good look at the attachments now, oh boy I'm thrilled...

vierdewereld · September 23, 2003

I really liked your solution Fenton because now I have just the part that was used to extract the text from in my Result layout. Simple and effective.. Thanks!

Jim your solution is to complex for my needs but definately handy in the future and for others to see. Great help!

Ugo, once again, thanks.

Sign In

Extract text to records

Recommended Posts

vierdewereld

Fenton

Ugo DI LUCA

Lee Smith

Ugo DI LUCA

Lee Smith

Ugo DI LUCA

Jim McKee

BobWeaver

Ugo DI LUCA

Jim McKee

Ugo DI LUCA

Jim McKee

Ugo DI LUCA

Jim McKee

vierdewereld

vierdewereld

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information