recursive clean

rivet · February 28, 2005

I have a string of css and I would like to remove all characters inbetween and including the opening and closing tags. '<code>'(code could be any length of characters).

I figured there a reursive text substitution would do it. Any ideas?

-Queue- · February 28, 2005

How about

Set Field [yourfield; Replace( yourfield; Position( yourfield; "<code>"; 0; 1 ); Position( yourfield; "</code>"; 0; 1 ) + 7 - Position( yourfield; "<code>"; 0; 1 ); "" )]

or

Set Field [yourfield; Let( B = Position( yourfield; "<code>"; 0; 1 ); Replace( yourfield; B; Position( yourfield; "</code>"; 0; 1 ) + 7 - B; "" ) )]

rivet · February 28, 2005

What I would like is to have it done all withing the calc. I know FMP7 can now do recursive calc, yet I have not been able to figure them out yet.

__

see attached

CleanTest.fp7.zip

comment · February 28, 2005

If I understand this correctly, you have a text with <some code> in it, and you would like to remove that as well as <another code>, and perhaps <yet another one> to get:

"If I understand this correctly, you have a text with in it, and you would like to remove that as well as, and perhaps to get:"

If so, yes - you need either a custom function or a looping script.

-Queue- · February 28, 2005

You could make yourfield an auto-enter calc with 'do not replace' deselected.

Let( B = Position( yourfield; "<code>"; 0; 1 ); Case( B; Replace( yourfield; B; Position( yourfield; "</code>"; 0; 1 ) + 7 - B; "" ); yourfield )

If you only want to remove the code tags, a simple Substitute( ) function would work also.

BobWeaver · March 1, 2005

I'm assuming Rivet wants to remove the code tags along with all the stuff between the opening and closing tags.

So, how 'bout this:

Evaluate ( """ & Substitute(YourField;["<code>";""&Left(""];["</code>";"";0)&""])&""" )

This will remove multiple occurrences all at once. Example:

Input:

"Here is some sample text.<code>Oh oh, here is some nasty code that needs to be removed</code> Now what was I talking about? Oh yes, I was saying<code>Ha ha more evil code stuff</code> that this text is code-free."

Output:

"Here is some sample text. Now what was I talking about? Oh yes, I was saying that this text is code-free."

rivet · March 1, 2005

wow...that is it - thanks

(in the words of Morpheus 'You are the one')

comment · March 1, 2005

Nice one, Bob! You managed to guess what the question was AND came up with a neat trick - double score!

BobWeaver · March 1, 2005

Thanks, Comment. I suspect we will be seeing a lot of neat stuff based on the Evaluate() function.

BTW, It occurred to me that if the original text contains quotation marks, the formula will fail. So here is a an updated version to correct the problem:

Evaluate ( """ & Substitute(TextField;["";""];[""";"""];["

comment · March 1, 2005

I was trying to make a more readable version of your formula, and while doing that I inadvertently solved the other problem as well:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<code>" ; Quote ( "& Left (" ) ] ;

[ "</code>" ; Quote ( " ; 0 ) & " ) ]

)

-Queue- · March 1, 2005

******* spiffy! I'll definitely be stealing that one in the future.

rivet · March 9, 2005

PartII

My ultimate goal is that I would like to be able to determine which words/characters in a field of text have been styled with bold, and tag it with my own tags.

So I figured if I converted the text with the GetAsCSS function and then create a recursive custom calc that would clean out all tagging but the bold which would be retagged with something else (ie '{b}' '{/b}').

This solution (from previous post) will clean all the CSS tagging:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<" ; Quote ( "& Left (" ) ] ;

[ ">" ; Quote ( " ; 0 ) & " ) ]

)

But before that I need the recursive scan for tag substitution of all lines with a SPAN Style of Bold in it.

EXAMPLE:

The quick brown fox jumped [color:"red"]over the lazy dog

GetAsCss Function:

The quick

brown

fox jumped

over

the lazy dog

Desired Result:

The quick {b}brown{/b} fox jumped {b}over{/b} the lazy dog.

rivet · March 9, 2005

PartII

My ultimate goal is that I would like to be able to determine which words/characters in a field of text have been styled with bold, and tag it with my own tags.

So I figured if I converted the text with the GetAsCSS function and then create a recursive custom calc that would clean out all tagging but the bold which would be retagged with something else (ie '{b}' '{/b}').

This solution (from previous post) will clean all the CSS tagging:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<" ; Quote ( "& Left (" ) ] ;

[ ">" ; Quote ( " ; 0 ) & " ) ]

)

But before that I need the recursive scan for tag substitution of all lines with a SPAN Style of Bold in it.

EXAMPLE:

The quick brown fox jumped [color:"red"]over the lazy dog

GetAsCss Function:

The quick

brown

fox jumped

over

the lazy dog

Desired Result:

The quick {b}brown{/b} fox jumped {b}over{/b} the lazy dog.

rivet · March 9, 2005

PartII

My ultimate goal is that I would like to be able to determine which words/characters in a field of text have been styled with bold, and tag it with my own tags.

So I figured if I converted the text with the GetAsCSS function and then create a recursive custom calc that would clean out all tagging but the bold which would be retagged with something else (ie '{b}' '{/b}').

This solution (from previous post) will clean all the CSS tagging:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<" ; Quote ( "& Left (" ) ] ;

[ ">" ; Quote ( " ; 0 ) & " ) ]

)

But before that I need the recursive scan for tag substitution of all lines with a SPAN Style of Bold in it.

EXAMPLE:

The quick brown fox jumped [color:"red"]over the lazy dog

GetAsCss Function:

The quick

brown

fox jumped

over

the lazy dog

Desired Result:

The quick {b}brown{/b} fox jumped {b}over{/b} the lazy dog.

BobWeaver · March 10, 2005

Hmm, gets more complicated all the time. Well, you can use this:

Evaluate(

Substitute(Quote(TextField);

["<SPAN";""&Let(;

["";""];Case(b;"{b}"&T&"{/b}";T))&""];

[">";"";"BOLD");T=""]

))

It works with the sample text you provided, but I'm not sure how this will interact with other tags that are embedded in the text.

BobWeaver · March 10, 2005

Hmm, gets more complicated all the time. Well, you can use this:

Evaluate(

Substitute(Quote(TextField);

["<SPAN";""&Let(;

["";""];Case(b;"{b}"&T&"{/b}";T))&""];

[">";"";"BOLD");T=""]

))

It works with the sample text you provided, but I'm not sure how this will interact with other tags that are embedded in the text.

BobWeaver · March 10, 2005

Hmm, gets more complicated all the time. Well, you can use this:

Evaluate(

Substitute(Quote(TextField);

["<SPAN";""&Let(;

["";""];Case(b;"{b}"&T&"{/b}";T))&""];

[">";"";"BOLD");T=""]

))

It works with the sample text you provided, but I'm not sure how this will interact with other tags that are embedded in the text.

BobWeaver · March 10, 2005

Actually, I do know how it will interact. It won't work unless you remove all non-SPAN tags first, then run it through the function I gave. So, your general search for "<" will have to be replaced with individual functions which look for and strip each different type of tag.

BobWeaver · March 10, 2005

Actually, I do know how it will interact. It won't work unless you remove all non-SPAN tags first, then run it through the function I gave. So, your general search for "<" will have to be replaced with individual functions which look for and strip each different type of tag.

BobWeaver · March 10, 2005

Actually, I do know how it will interact. It won't work unless you remove all non-SPAN tags first, then run it through the function I gave. So, your general search for "<" will have to be replaced with individual functions which look for and strip each different type of tag.

BobWeaver · March 10, 2005

...and then again...

I think this should work without interference with other tags:

Let([

a=Substitute(Quote(TextField);

["<";""&Let([T="<"];

[">";">";

S=PatternCount(T;"SPAN");

X=PatternCount(T;"</");

B=PatternCount(T;"BOLD")];

Case(X;T;S AND B;"<SPANBOLD>";S;"";T))&""]);

b=Evaluate(a);

c=Substitute(Quote(:;

["<SPANBOLD>";""&Let(;

["";""&Let(;

["";""];T&:&""])

];

Evaluate©

)

So, you should be able to run it through this first, to apply the bold tags, and then you can run the result through the other function to strip off all the rest of the tags.

And they say you can't do regex in Filemaker. Ha!

<<Edit note: Corrected the formula; I missed a slash & had a reference to a missing field. >>

BobWeaver · March 10, 2005

...and then again...

I think this should work without interference with other tags:

Let([

a=Substitute(Quote(TextField);

["<";""&Let([T="<"];

[">";">";

S=PatternCount(T;"SPAN");

X=PatternCount(T;"</");

B=PatternCount(T;"BOLD")];

Case(X;T;S AND B;"<SPANBOLD>";S;"";T))&""]);

b=Evaluate(a);

c=Substitute(Quote(:;

["<SPANBOLD>";""&Let(;

["";""&Let(;

["";""];T&:&""])

];

Evaluate©

)

So, you should be able to run it through this first, to apply the bold tags, and then you can run the result through the other function to strip off all the rest of the tags.

And they say you can't do regex in Filemaker. Ha!

<<Edit note: Corrected the formula; I missed a slash & had a reference to a missing field. >>

BobWeaver · March 10, 2005

...and then again...

I think this should work without interference with other tags:

Let([

a=Substitute(Quote(TextField);

["<";""&Let([T="<"];

[">";">";

S=PatternCount(T;"SPAN");

X=PatternCount(T;"</");

B=PatternCount(T;"BOLD")];

Case(X;T;S AND B;"<SPANBOLD>";S;"";T))&""]);

b=Evaluate(a);

c=Substitute(Quote(:;

["<SPANBOLD>";""&Let(;

["";""&Let(;

["";""];T&:&""])

];

Evaluate©

)

So, you should be able to run it through this first, to apply the bold tags, and then you can run the result through the other function to strip off all the rest of the tags.

And they say you can't do regex in Filemaker. Ha!

<<Edit note: Corrected the formula; I missed a slash & had a reference to a missing field. >>

BobWeaver · March 21, 2005

Just one last followup. I was playing with a variation of the last formula for use in a current project, and I noticed a couple of things you need to watch out for:

1. When using the GetAsCSS() function to generate the original tagged text, it will 'escape' certain reserved characters such as <, >, &, ", and all non-ASCII characters (codes >127). So, you will need to use Substitute to convert them back, Example:

Substitute(CSSText;

[""";"""]

["’";"

BobWeaver · March 21, 2005

Just one last followup. I was playing with a variation of the last formula for use in a current project, and I noticed a couple of things you need to watch out for:

1. When using the GetAsCSS() function to generate the original tagged text, it will 'escape' certain reserved characters such as <, >, &, ", and all non-ASCII characters (codes >127). So, you will need to use Substitute to convert them back, Example:

Substitute(CSSText;

[""";"""]

["’";"

rivet · March 21, 2005

Bob this has been a great help, I am still wrapping my head around it. Thanks for the update.

rivet · March 21, 2005

Bob this has been a great help, I am still wrapping my head around it. Thanks for the update.

comment · March 21, 2005

There are more problems like that. For example, if the text contains a CR, the CSS code will convert it to . Now this is considered to be code, and therefore is removed.

I believe the code indeed needs to be pre- and post-processed to catch these cases. BTW, your formula returns the text broken into separate lines, following the FMP convention of coding CSS into separate lines. So, not only are the original breaks removed, I am getting a lot of new ones. This can also be solved by removing the code's CR's in preprocessing.

There is another problem for which I don't yet see a solution:

Filemaker ends each codeline with a space and CR. The spaces are not inside < > brackets, so they are considered a part of the original text. Now, if the code is known to originate in Filemaker, it can be dealt with (but then, why would anyone bother). If the source of the code is unknown, it is unpredictable. Someone might write (some spaces here) .

comment · March 21, 2005

There are more problems like that. For example, if the text contains a CR, the CSS code will convert it to . Now this is considered to be code, and therefore is removed.

I believe the code indeed needs to be pre- and post-processed to catch these cases. BTW, your formula returns the text broken into separate lines, following the FMP convention of coding CSS into separate lines. So, not only are the original breaks removed, I am getting a lot of new ones. This can also be solved by removing the code's CR's in preprocessing.

There is another problem for which I don't yet see a solution:

Filemaker ends each codeline with a space and CR. The spaces are not inside < > brackets, so they are considered a part of the original text. Now, if the code is known to originate in Filemaker, it can be dealt with (but then, why would anyone bother). If the source of the code is unknown, it is unpredictable. Someone might write (some spaces here) .

BobWeaver · March 21, 2005

As for the line breaks, yes that's something else created by the GetAsCSS function that needs to be processed. How these things are handled will depend on the circumstances of the specific application.

In my own particular project, I was able to account for the space-CR at the end of the code line by including it in the search/replace. So, it effectively gets deleted. And I had already pre-processed the linebreaks in the original text, so any tags that occurred later were spurious and could be deleted.

Finally, I was basing my formula on the assumption that the tagged text was well formed. If it's not, then there's no way of fixing that. Any other text processor would be equally unhappy finding a closing tag before an opening tag. But, if the source text is generated by the GetAsCSS() function, that should never happen.

BobWeaver · March 21, 2005

As for the line breaks, yes that's something else created by the GetAsCSS function that needs to be processed. How these things are handled will depend on the circumstances of the specific application.

In my own particular project, I was able to account for the space-CR at the end of the code line by including it in the search/replace. So, it effectively gets deleted. And I had already pre-processed the linebreaks in the original text, so any tags that occurred later were spurious and could be deleted.

Finally, I was basing my formula on the assumption that the tagged text was well formed. If it's not, then there's no way of fixing that. Any other text processor would be equally unhappy finding a closing tag before an opening tag. But, if the source text is generated by the GetAsCSS() function, that should never happen.

comment · March 21, 2005

Any other text processor would be equally unhappy finding a closing tag before an opening tag

I am afraid that got lost in the translation. I meant, even in a well-formed marked-up text, you can have a closing tag followed directly by a an opening tag. A browser is supposed to ignore any spaces or CR's in between. For example:

<h1> my heading </h1>

(any number of spaces/CR's here)

My real text...

or:

This is a sample which will fail

comment · March 21, 2005

Any other text processor would be equally unhappy finding a closing tag before an opening tag

I am afraid that got lost in the translation. I meant, even in a well-formed marked-up text, you can have a closing tag followed directly by a an opening tag. A browser is supposed to ignore any spaces or CR's in between. For example:

<h1> my heading </h1>

(any number of spaces/CR's here)

My real text...

or:

This is a sample which will fail

BobWeaver · March 22, 2005

Sorry, yes, I misunderstood what you were getting at.

I have been doing a bunch of things lately with rtf files where the rules a bit different, and was thinking about two different things at the same time.

Sign In

recursive clean

Recommended Posts

Create an account or sign in to comment

Create an account

Sign in

Important Information