Jump to content

recursive clean


rivet

This topic is 6973 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have a string of css and I would like to remove all characters inbetween and including the opening and closing tags. '<code>'(code could be any length of characters).

I figured there a reursive text substitution would do it. Any ideas?

Link to comment
Share on other sites

How about

Set Field [yourfield; Replace( yourfield; Position( yourfield; "<code>"; 0; 1 ); Position( yourfield; "</code>"; 0; 1 ) + 7 - Position( yourfield; "<code>"; 0; 1 ); "" )]

or

Set Field [yourfield; Let( B = Position( yourfield; "<code>"; 0; 1 ); Replace( yourfield; B; Position( yourfield; "</code>"; 0; 1 ) + 7 - B; "" ) )]

Link to comment
Share on other sites

If I understand this correctly, you have a text with <some code> in it, and you would like to remove that as well as <another code>, and perhaps <yet another one> to get:

"If I understand this correctly, you have a text with in it, and you would like to remove that as well as, and perhaps to get:"

If so, yes - you need either a custom function or a looping script.

Link to comment
Share on other sites

You could make yourfield an auto-enter calc with 'do not replace' deselected.

Let( B = Position( yourfield; "<code>"; 0; 1 ); Case( B; Replace( yourfield; B; Position( yourfield; "</code>"; 0; 1 ) + 7 - B; "" ); yourfield )

If you only want to remove the code tags, a simple Substitute( ) function would work also.

Link to comment
Share on other sites

I'm assuming Rivet wants to remove the code tags along with all the stuff between the opening and closing tags.

So, how 'bout this:

Evaluate ( """ & Substitute(YourField;["<code>";""&Left(""];["</code>";"";0)&""])&""" )

This will remove multiple occurrences all at once. Example:

Input:

"Here is some sample text.<code>Oh oh, here is some nasty code that needs to be removed</code> Now what was I talking about? Oh yes, I was saying<code>Ha ha more evil code stuff</code> that this text is code-free."

Output:

"Here is some sample text. Now what was I talking about? Oh yes, I was saying that this text is code-free."

Link to comment
Share on other sites

Thanks, Comment. I suspect we will be seeing a lot of neat stuff based on the Evaluate() function.

BTW, It occurred to me that if the original text contains quotation marks, the formula will fail. So here is a an updated version to correct the problem:

Evaluate ( """ & Substitute(TextField;["";""];[""";"""];["

Link to comment
Share on other sites

I was trying to make a more readable version of your formula, and while doing that I inadvertently solved the other problem as well:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<code>" ; Quote ( "& Left (" ) ] ;

[ "</code>" ; Quote ( " ; 0 ) & " ) ]

)

)

Link to comment
Share on other sites

PartII

My ultimate goal is that I would like to be able to determine which words/characters in a field of text have been styled with bold, and tag it with my own tags.

So I figured if I converted the text with the GetAsCSS function and then create a recursive custom calc that would clean out all tagging but the bold which would be retagged with something else (ie '{b}' '{/b}').

This solution (from previous post) will clean all the CSS tagging:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<" ; Quote ( "& Left (" ) ] ;

[ ">" ; Quote ( " ; 0 ) & " ) ]

)

)

But before that I need the recursive scan for tag substitution of all lines with a SPAN Style of Bold in it.

EXAMPLE:

The quick brown fox jumped [color:"red"]over the lazy dog

GetAsCss Function:

<SPAN STYLE= "" >The quick </SPAN>

<SPAN STYLE= "font-weight: bold;" >brown</SPAN>

<SPAN STYLE= "" > fox jumped </SPAN>

<SPAN STYLE= "color: #AA0000;font-weight: bold;text-decoration:underline;" >over</SPAN>

<SPAN STYLE= "" > the lazy dog</SPAN>

Desired Result:

The quick {b}brown{/b} fox jumped {b}over{/b} the lazy dog.

Link to comment
Share on other sites

PartII

My ultimate goal is that I would like to be able to determine which words/characters in a field of text have been styled with bold, and tag it with my own tags.

So I figured if I converted the text with the GetAsCSS function and then create a recursive custom calc that would clean out all tagging but the bold which would be retagged with something else (ie '{b}' '{/b}').

This solution (from previous post) will clean all the CSS tagging:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<" ; Quote ( "& Left (" ) ] ;

[ ">" ; Quote ( " ; 0 ) & " ) ]

)

)

But before that I need the recursive scan for tag substitution of all lines with a SPAN Style of Bold in it.

EXAMPLE:

The quick brown fox jumped [color:"red"]over the lazy dog

GetAsCss Function:

<SPAN STYLE= "" >The quick </SPAN>

<SPAN STYLE= "font-weight: bold;" >brown</SPAN>

<SPAN STYLE= "" > fox jumped </SPAN>

<SPAN STYLE= "color: #AA0000;font-weight: bold;text-decoration:underline;" >over</SPAN>

<SPAN STYLE= "" > the lazy dog</SPAN>

Desired Result:

The quick {b}brown{/b} fox jumped {b}over{/b} the lazy dog.

Link to comment
Share on other sites

PartII

My ultimate goal is that I would like to be able to determine which words/characters in a field of text have been styled with bold, and tag it with my own tags.

So I figured if I converted the text with the GetAsCSS function and then create a recursive custom calc that would clean out all tagging but the bold which would be retagged with something else (ie '{b}' '{/b}').

This solution (from previous post) will clean all the CSS tagging:

Evaluate (

Substitute (

Quote ( TextField ) ;

[ "<" ; Quote ( "& Left (" ) ] ;

[ ">" ; Quote ( " ; 0 ) & " ) ]

)

)

But before that I need the recursive scan for tag substitution of all lines with a SPAN Style of Bold in it.

EXAMPLE:

The quick brown fox jumped [color:"red"]over the lazy dog

GetAsCss Function:

<SPAN STYLE= "" >The quick </SPAN>

<SPAN STYLE= "font-weight: bold;" >brown</SPAN>

<SPAN STYLE= "" > fox jumped </SPAN>

<SPAN STYLE= "color: #AA0000;font-weight: bold;text-decoration:underline;" >over</SPAN>

<SPAN STYLE= "" > the lazy dog</SPAN>

Desired Result:

The quick {b}brown{/b} fox jumped {b}over{/b} the lazy dog.

Link to comment
Share on other sites

Hmm, gets more complicated all the time. Well, you can use this:

Evaluate(

Substitute(Quote(TextField);

["<SPAN";""&Let(;

["</SPAN>";""];Case(b;"{b}"&T&"{/b}";T))&""];

[">";"";"BOLD");T=""]

))

It works with the sample text you provided, but I'm not sure how this will interact with other tags that are embedded in the text.

Link to comment
Share on other sites

Hmm, gets more complicated all the time. Well, you can use this:

Evaluate(

Substitute(Quote(TextField);

["<SPAN";""&Let(;

["</SPAN>";""];Case(b;"{b}"&T&"{/b}";T))&""];

[">";"";"BOLD");T=""]

))

It works with the sample text you provided, but I'm not sure how this will interact with other tags that are embedded in the text.

Link to comment
Share on other sites

Hmm, gets more complicated all the time. Well, you can use this:

Evaluate(

Substitute(Quote(TextField);

["<SPAN";""&Let(;

["</SPAN>";""];Case(b;"{b}"&T&"{/b}";T))&""];

[">";"";"BOLD");T=""]

))

It works with the sample text you provided, but I'm not sure how this will interact with other tags that are embedded in the text.

Link to comment
Share on other sites

Actually, I do know how it will interact. It won't work unless you remove all non-SPAN tags first, then run it through the function I gave. So, your general search for "<" will have to be replaced with individual functions which look for and strip each different type of tag.

Link to comment
Share on other sites

Actually, I do know how it will interact. It won't work unless you remove all non-SPAN tags first, then run it through the function I gave. So, your general search for "<" will have to be replaced with individual functions which look for and strip each different type of tag.

Link to comment
Share on other sites

Actually, I do know how it will interact. It won't work unless you remove all non-SPAN tags first, then run it through the function I gave. So, your general search for "<" will have to be replaced with individual functions which look for and strip each different type of tag.

Link to comment
Share on other sites

...and then again...

I think this should work without interference with other tags:

Let([

a=Substitute(Quote(TextField);

["<";""&Let([T="<"];

[">";">";

S=PatternCount(T;"SPAN");

X=PatternCount(T;"</");

B=PatternCount(T;"BOLD")];

Case(X;T;S AND B;"<SPANBOLD>";S;"<SPAN>";T))&""]);

b=Evaluate(a);

c=Substitute(Quote(:;

["<SPANBOLD>";""&Let(;

["<SPAN>";""&Let(;

["</SPAN>";""];T&:&""])

];

Evaluate©

)

So, you should be able to run it through this first, to apply the bold tags, and then you can run the result through the other function to strip off all the rest of the tags.

And they say you can't do regex in Filemaker. Ha!

<<Edit note: Corrected the formula; I missed a slash & had a reference to a missing field. >>

Link to comment
Share on other sites

...and then again...

I think this should work without interference with other tags:

Let([

a=Substitute(Quote(TextField);

["<";""&Let([T="<"];

[">";">";

S=PatternCount(T;"SPAN");

X=PatternCount(T;"</");

B=PatternCount(T;"BOLD")];

Case(X;T;S AND B;"<SPANBOLD>";S;"<SPAN>";T))&""]);

b=Evaluate(a);

c=Substitute(Quote(:;

["<SPANBOLD>";""&Let(;

["<SPAN>";""&Let(;

["</SPAN>";""];T&:&""])

];

Evaluate©

)

So, you should be able to run it through this first, to apply the bold tags, and then you can run the result through the other function to strip off all the rest of the tags.

And they say you can't do regex in Filemaker. Ha!

<<Edit note: Corrected the formula; I missed a slash & had a reference to a missing field. >>

Link to comment
Share on other sites

...and then again...

I think this should work without interference with other tags:

Let([

a=Substitute(Quote(TextField);

["<";""&Let([T="<"];

[">";">";

S=PatternCount(T;"SPAN");

X=PatternCount(T;"</");

B=PatternCount(T;"BOLD")];

Case(X;T;S AND B;"<SPANBOLD>";S;"<SPAN>";T))&""]);

b=Evaluate(a);

c=Substitute(Quote(:;

["<SPANBOLD>";""&Let(;

["<SPAN>";""&Let(;

["</SPAN>";""];T&:&""])

];

Evaluate©

)

So, you should be able to run it through this first, to apply the bold tags, and then you can run the result through the other function to strip off all the rest of the tags.

And they say you can't do regex in Filemaker. Ha!

<<Edit note: Corrected the formula; I missed a slash & had a reference to a missing field. >>

Link to comment
Share on other sites

  • 2 weeks later...

Just one last followup. I was playing with a variation of the last formula for use in a current project, and I noticed a couple of things you need to watch out for:

1. When using the GetAsCSS() function to generate the original tagged text, it will 'escape' certain reserved characters such as <, >, &, ", and all non-ASCII characters (codes >127). So, you will need to use Substitute to convert them back, Example:

Substitute(CSSText;

["&quot;";"""]B)

["&rsquo;";"

Link to comment
Share on other sites

Just one last followup. I was playing with a variation of the last formula for use in a current project, and I noticed a couple of things you need to watch out for:

1. When using the GetAsCSS() function to generate the original tagged text, it will 'escape' certain reserved characters such as <, >, &, ", and all non-ASCII characters (codes >127). So, you will need to use Substitute to convert them back, Example:

Substitute(CSSText;

["&quot;";"""]B)

["&rsquo;";"

Link to comment
Share on other sites

There are more problems like that. For example, if the text contains a CR, the CSS code will convert it to <BR>. Now this is considered to be code, and therefore is removed.

I believe the code indeed needs to be pre- and post-processed to catch these cases. BTW, your formula returns the text broken into separate lines, following the FMP convention of coding CSS into separate lines. So, not only are the original breaks removed, I am getting a lot of new ones. This can also be solved by removing the code's CR's in preprocessing.

There is another problem for which I don't yet see a solution:

Filemaker ends each codeline with a space and CR. The spaces are not inside < > brackets, so they are considered a part of the original text. Now, if the code is known to originate in Filemaker, it can be dealt with (but then, why would anyone bother). If the source of the code is unknown, it is unpredictable. Someone might write </SPAN> (some spaces here) <SPAN>.

Link to comment
Share on other sites

There are more problems like that. For example, if the text contains a CR, the CSS code will convert it to <BR>. Now this is considered to be code, and therefore is removed.

I believe the code indeed needs to be pre- and post-processed to catch these cases. BTW, your formula returns the text broken into separate lines, following the FMP convention of coding CSS into separate lines. So, not only are the original breaks removed, I am getting a lot of new ones. This can also be solved by removing the code's CR's in preprocessing.

There is another problem for which I don't yet see a solution:

Filemaker ends each codeline with a space and CR. The spaces are not inside < > brackets, so they are considered a part of the original text. Now, if the code is known to originate in Filemaker, it can be dealt with (but then, why would anyone bother). If the source of the code is unknown, it is unpredictable. Someone might write </SPAN> (some spaces here) <SPAN>.

Link to comment
Share on other sites

As for the <BR> line breaks, yes that's something else created by the GetAsCSS function that needs to be processed. How these things are handled will depend on the circumstances of the specific application.

In my own particular project, I was able to account for the space-CR at the end of the code line by including it in the search/replace. So, it effectively gets deleted. And I had already pre-processed the linebreaks in the original text, so any <BR> tags that occurred later were spurious and could be deleted.

Finally, I was basing my formula on the assumption that the tagged text was well formed. If it's not, then there's no way of fixing that. Any other text processor would be equally unhappy finding a closing tag before an opening tag. But, if the source text is generated by the GetAsCSS() function, that should never happen.

Link to comment
Share on other sites

As for the <BR> line breaks, yes that's something else created by the GetAsCSS function that needs to be processed. How these things are handled will depend on the circumstances of the specific application.

In my own particular project, I was able to account for the space-CR at the end of the code line by including it in the search/replace. So, it effectively gets deleted. And I had already pre-processed the linebreaks in the original text, so any <BR> tags that occurred later were spurious and could be deleted.

Finally, I was basing my formula on the assumption that the tagged text was well formed. If it's not, then there's no way of fixing that. Any other text processor would be equally unhappy finding a closing tag before an opening tag. But, if the source text is generated by the GetAsCSS() function, that should never happen.

Link to comment
Share on other sites

Any other text processor would be equally unhappy finding a closing tag before an opening tag

I am afraid that got lost in the translation. I meant, even in a well-formed marked-up text, you can have a closing tag followed directly by a an opening tag. A browser is supposed to ignore any spaces or CR's in between. For example:

<h1> my heading </h1>

(any number of spaces/CR's here)

<p>

My real text...

or:

<SPAN STYLE= "" >This is a sample which </SPAN> <SPAN STYLE= "font-weight: bold;" >will</SPAN> <SPAN STYLE= "" > fail</SPAN>

Link to comment
Share on other sites

Any other text processor would be equally unhappy finding a closing tag before an opening tag

I am afraid that got lost in the translation. I meant, even in a well-formed marked-up text, you can have a closing tag followed directly by a an opening tag. A browser is supposed to ignore any spaces or CR's in between. For example:

<h1> my heading </h1>

(any number of spaces/CR's here)

<p>

My real text...

or:

<SPAN STYLE= "" >This is a sample which </SPAN> <SPAN STYLE= "font-weight: bold;" >will</SPAN> <SPAN STYLE= "" > fail</SPAN>

Link to comment
Share on other sites

This topic is 6973 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.