Jump to content
Server Maintenance This Week. ×

CF - to get a list of all the text items...


This topic is 6002 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi,

I have a record that contains a field of "body" text - I need a custom function that will extract all the text items enclosed in square braces into a separate list eg:

Body_text = "Cats and dogs and [s1t1] make [s1t2] pets though they need [s2y1] handling."

The result would be a paragraph delimited list containing

"s1t1"¶"s1t2"¶"s2y1"

Does anybody know of such/a similar function or have any pointers on how to create such a cf

Many thanks

Edited by Guest
Link to comment
Share on other sites

Hello,

You can make this with a recursive or with CustomList () http://www.briandunning.com/cf/747, with this calculation :)

Let ([

$BodyText = "Cats and dogs and [s1t1] make [s1t2] pets though they need [s2y1] handling"

];

CustomList ( 1 ; PatternCount ( $BodyText ; "[" ) ;

"let ([

PosL = Position ( $BodyText ;""["" ; 1 ; [n] ) + 1 ;

PosR = Position ( $BodyText ; ""]"" ; PosL ; 1 )];Middle ( $BodyText ; PosL ; PosR - PosL ) )" )

)

Agnès

hum..... or finally, without cf, this calculation perhaps :

Let ([

BodyText = Substitute ( "Cats and dogs and [s1t1] make [s1t2] pets though they need [s2y1] handling"; [" " ; ¶] ) ;

Result = FilterValues ( BodyText ; Substitute ( ¶ & BodyText & ¶ ; [ ¶ ; "¶##" ] ; ["¶##[" ; "¶["]) )

] ;

Substitute ( Result ; [ "[" ; ""] ; [ "]" ; ""] )

)

Edited by Guest
.....
Link to comment
Share on other sites

Or:


Let ( 

temp = Evaluate ( 

Substitute ( 

Quote (  "]" & text & "[" ) ; 

[ "]" ; Quote ( " & /*" ) ] ;

[ "[" ; Quote ( " */" ) & "¶"  ] 

)

) ;

MiddleValues (temp ; 2 ; ValueCount ( temp ) - 1 )

)

Adapted from here:

http://www.fmforums.com/forum/showtopic.php?tid/149507/post/149690/#149690

Link to comment
Share on other sites

Yep, I like */ . thanks a lot for the calcul and the link.

(I allowed myself to change mine to remove the finale "" )

Agnès

hhum... I like evaluate, substitute and... tests...

I test the 3 calculations [intel and MacTiger - time and number of "["] :

if the text has more than 4524 "[", the calcul with evaluate does not turn over a result, (with 4523, it's ok from 4 seconds)

the calcul with FilterValues find result with 24 seconds !!! (text contains 18000 words, perhaps is the reason for this timing)

and finally, CustomList (), 5000 "[" in 5 seconds, it's ok until 18700 "[" because I make calculation on the number of "[" and not the word count

I hope not to annoy you with my tests, I am wary of the limit of evaluate

ok, I know, 4525 "[" it is very much :)

Agnès

Edited by Guest
tests......
Link to comment
Share on other sites

No it's interesting, but you are missing a result for a recursive function. I am aware of some of the limits - I believe Substitute() is the limiting factor.

You might find this interesting too:

http://www.fmforums.com/forum/showtopic.php?tid/187248/post/253049/#253049

Link to comment
Share on other sites

No it's interesting, but you are missing a result for a recursive function. I am aware of some of the limits - I believe Substitute() is the limiting factor.
I am not sure, substitute is really fast. even for substitute 50000 values....

I changed the calculation with FilterValues :)

Let ([

BodyText = Substitute ( "Cats and dogs and [s1t1] make [s1t2] pets though they need [s2y1] handling" ; [ ¶ ; " " ] ; ["[" ; "##¶" ] ; ["]" ; "¶##" ]) ;

sub = Substitute ( BodyText ; "##" ; "" )

] ;

FilterValues ( BodyText ; Sub )

)

The result from 5000 "[" and 18000 words is now ok from 5 seconds.... FilterValues doesn t like ## and other symbols or too many values....(?)

No, I don't test recursive, I am sure that it will spend more time

Thanks for the Link with TimingTests, I go to test my cf "SwitchValues()" now :

Link to comment
Share on other sites

Hi guys - thanks for the responses - now breaking them down to understand how they work ( a bit mystifying at the moment but I will get there).

In the meantime I have noticed that (using Michael's formula) if one of square brackets of a pair is missing then the calc doesn't return any values (even if there are other valid strings to extract)?

Anyway thanks again

Link to comment
Share on other sites

(even if there are other valid strings to extract)?

because Evaluate find an error, and can't have a result (if you put the calculation on the data viewer and you desactivate just "Evaluate", you saw.

for all calculations, you can test Number of "[" = number of "]" and if is not equal, you stop the calcul and note the erreur

I'm not talking about speed, but about a limit on the number of substitutions it will accept.

Ok. I don't know too.

Link to comment
Share on other sites

Hi Michael, Agnes,

I always feel as though I have to really "understand" whatever I put my application and now I do - it took a while but the penny finally dropped so thank you.

Just as some feed back and in case other people need to know some of the differences, I found the following:

The calc from Agnes feels simpler and works well when you are in control of/aware of all of the characters in the body_text.

In my particular case I am working with some large html text blocks and I have found that it does include some "nasty" non-printing characters that I couldn't quite get my hands on.

So in this case the pure "completeness" of Michaels calc works far more efficiently. I suspect it will always require different opening and closing delimiters - which, again in my case is no problem because we are also in charge of the originating HTML.

Many thanks

Simon

Link to comment
Share on other sites

I don't know whether I should start another thread - just in case anybody looks...

Is there a way of making this final list into just unique occurences of each value extracted from the main text?

thanks

S

Edited by Guest
Link to comment
Share on other sites

I think it would be better to move to a recursive function. I don't have time to do it now, but roughly the function would look something like:

ExtractUnique ( text ; startCode ; endCode ; result )

and it would look for the first occurrence of startCode in text. If found, extract the string between it and the first occurence of endCode in text. If the string is not already in result, append it to the result.

Then call itself again with the rest of the text and the new result. If not found, return the result.

Link to comment
Share on other sites

This topic is 6002 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.