Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×
The Claris Museum: The Vault of FileMaker Antiquities at Claris Engage 2025! ×

This topic is 4386 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Posted

I have buit the following calculation to compare free text with a library of key words, and count the occurrence of the library words in the free text. This works great, but.......

 

The library lists for some words are stems, like "happi" to capture "happier" and "happiness". The solution below works fine for exact matches between target text words and whole(non stem) library words. Is there a way I can capture the partial stem matches, either within the framework I have or in some other way. Text samples will run up to 1,000 words, and libraries may have as many as 500 words.

 

Let([
t = text1 ;
r = search_library::sad ;
adj_t = Substitute ( t ; ", " ; ¶ ) ;
adj_t = Substitute ( t ; "; " ; ¶ ) ;
adj_r = List ( r )
];
ValueCount ( FilterValues ( adj_t ; adj_r ) )
)

 

 

Posted

Not an answer to the question but why are there two different calcs for adj_t ?

Right now you only get the last result.

 

Perhaps you should be using this?

 

adj_t = Substitute ( t ; [ ", " ; ¶] ; [ "; " ; ¶] );

Posted
The solution below works fine for exact matches between target text words and whole(non stem) library words. Is there a way I can capture the partial stem matches

 

I am not sure I understand your question. I am quite sure you cannot use FilterValues() with wild cards - though sometimes it can be used to pass "values that begin with ..." or "values that end with ...". Another option is to "explode" one or both sides of the comparison, so that "happiness" becomes "happiness¶happines¶happine¶happin¶happi¶happ¶hap", for instance. I think the task here needs to be defined better than by a single example.

Posted

Sounds like you've got two blocks of text. One the source, with a bunch of words and the other a library with another bunch of words. You want to go through each word in the source and see if it, or a version of it, appears in the library?

 

I think you're going to need a recursive custom function using the PatternCount() function.

 

Of course, the English language is super-complex and it'll be hard to do real pattern matching. Happier and happiness both have the happi- stem. But happy does not. happ would be the stem, but that would then catch happenings and happenstance.

Posted

Thanks to you all.

 

Bruce -  I was unaware that the adj_t calculation would be performed only once. I assumed that the first one would deal with "," delimited text and the second would deal with ";" semi-colon delimited text.  That is not a big problem. We are comfortable in our abilities to conver the target text into a list of words.

 

To the anonymous commentator - Not sure how to describe the task more clearly. I have text samples that I am converting into a long list of words, CR-separated, in a single FM text field. I wish to check the occurrence of each word in this target text list against a standardized library of words. The current library contains about 4500 words (these comprise about 80 different libraries) about 75% of the words are entered like accept*, for accepted, accepting, and acceptable; or happy* for happier, happiest, and happiness (happy appears on the list as its own entry). I can solve the problem with brute force by simply adding all the words that are currently being captured by the single stem with the wildcard. But that's like 3500 stems that need to be addressed. An automated solution would be preferable. Thanks for the link too.

 

David - I was thinking along those lines as well. I do not have a lot of experience with writing custom functions, and even less with working with lists. As I note above, I have total confidence that my libraries are constructed to avoid omission of matches; that is the list contains happy as its own word and happi* to cover the others.

 

If you happen to have an example of a function that reads evaluates and operates on list items, I am pretty sure I can use patterncount to do the job.

 

Thanks to all.

Posted

Yes, the first one instance of adj_t would do what you want. Briefly.

 

But since the second instance makes no reference to the first instance, it "re-declares" adj_t completely.

 

To get the results you intended you need to use the calculation I suggested.

The multi-operation subsitute method is something you should learn in any case.

 

If you are going to use your sequential method, you would need to properly refer to adj_t from the first operation in the second operation:

 

 

adj_t = Substitute ( t ; ", " ; ¶ ) ;

adj_t = Substitute ( adj_t ; "; " ; ¶ ) ;

Posted (edited)

Thanks to you all.

 

Not sure how to describe the task more clearly. I have text samples that I am converting into a long list of words, CR-separated, in a single FM text field. I wish to check the occurrence of each word in this target text list against a standardized library of words.

 

You need to refine the definition of a match. If the word "happier" is supposed to match the library entry "happi" AND you are sure there cannot be another library entry of "happ", then exploding "happier" as I explained above should do the job, I think?

 

 

---

BTW, I am no more anonymous than you or any other member; "comment" is a screen name, as is "Hammerton". :tongue:

Edited by comment

This topic is 4386 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.