Concordance counts

January 22, 200917 yr

Hi all,

I have a linguistic data set that I manage in FM9. The mechanics of it are very simple and my database is not much more complicated than a stack of (non-relational) notecards.

I'd like to turn this thing into a more sophisticated linguistic analysis toolkit, but want to know if what I'm thinking would be possible. I've spent a lot of time digging through the documentation and forum posts here, but am still unclear about what scripts can and can't do.

I would like a script to go through the records in one of my tables, parse the text in some of that table's fields, and output a concordance tally. A concordance tally is a list of all the words that occur in a text paired with the number of times that word occurs.

Would such a thing be possible? Could I use AppleScript to accomplish this?

Thanks in advance!

peter

January 25, 200917 yr

Yes that sounds possible, but complicated.

Here are a few rough ideas off the top of my head...

Loop through every word in the field you want a concordance tally for, putting every word in that field into a new related record. You could then do a sub-summary of those related records to get a count of how many times they occurred in that field.

-or-

Loop through every word in the field, putting each unique word in another text field with the total occurrences next to it.

The heart of your script will look something like this:


Set Variable[$counter; Value:1]

Loop

    Set Field[...]

    Set Variable[$counter; Value:$counter + 1]

    Exit Loop If[$counter > WordCount( "yourTextField" )]

End Loop

You will need to use MiddleWords and PatternCount functions in the Set Field script step to extrapolate and count each word from the field.

January 30, 200917 yr

Author

Thanks so much for your help, Dan.

I've now got a script that iterates through my records and creates a new record in another table for each word.

What's the best way now to turn that table with each word representing a record into a table with each record representing a unique word, with a separate field indicating the number of times that word appears?

January 30, 200917 yr

Peter,

I'm glad I could help.

Rather than crating a new table with a separate record for each unique word and the total occurrences of that word, you can use a sub summary report on the table of data you already have. I did a quick search and found this resource that should be able to walk you through this process: http://filemakerreports.blogspot.com/2008/04/filemaker-creating-sub-summary-report.html

The down side of using a sub summary report to view the totals is that it can only be viewed in Preview mode (I think this may have changed in version 10 - I'm not sure though).

January 30, 200917 yr

You could also define a self-join relationship of the Words table, matching on the word field. A calculation field =

Count ( Words 2::Word )

will return the total number of occurences of each word.

If you like, you can create a third table of UniqueWords in which the Word field is validated for 'Unique value, Validate always'. Then import the words and their counts into this table to create the final concordance.

Note that this entire process is suitable for a one-time effort. If you modify your data, you will need to repeat everything from the top.

January 31, 200917 yr

Author

Thanks again, guys.

I'm using the self-join approach right now because needing to re-run my script when my dataset changes is less of a hassle than using preview mode.

One further snag: I have another field, called Word_root, that is the product of a calculation (comparison against a lemma dictionary, which makes all occurances of "apples" -> "apple"). Basically, Word_root is the lexical root of whatever is contained in Word.

The Count function works perfectly when I use it on my Word data field, but Word_root, I suspect because it is a calculation. Is that right? Is there anything to do about it?

January 31, 200917 yr

What exactly does the calculation do?

January 31, 200917 yr

Author

I've got a large table with two fields: root and variation. The word "eat", for example, might have several root, variation pairs:

eat, eats

eat, eating

eat, ate

etc.

The script checks the table to see if the word matches a variation with a root. If it does, it returns the root. If not, it returns the original word.

Does that make sense? Let me know if there's anything more I can add to make things clearer. This is all part of a very large project, so I'm simplifying things when I describe my database. Without much experience, though, I don't always know the difference between relevant and irrelevant.

January 31, 200917 yr

Your lemmas table needs to be organized in such a way that a word can be related to its root. If "eat" and "eating" are two fields (say Root and Variation), then you could simply lookup the root into a a local field in the Words table through a relationship matching:

Words::Word = Lemmas::Variation

Alternatively, you could make all "siblings" of a common root related:

Words --[Word=Variation]-- Lemmas --[Root=Root]-- Lemmas2 --[Variation=Word]-- Words 2

Edited January 31, 200917 yr by Guest

February 1, 200917 yr

Author

If I understand you, I think that's basically what I've done, except I use an intermediate field to hold the root temporarily.

I use a relationship with the lemma table to import a words root (if it exists) into an intermediate field. The calculated field is then the contents of the intermediate field, if there is any, or the original word, if there isn't. Is there a more direct way of doing this?

Thanks again for all the help. It's been invaluable.

February 1, 200917 yr

For the calculated Word_root field, what is "Calculation result is" set to? It should be "Text". If it is "Number", then that might be throwing off the Count function. (that's a simple little mistake that I often make)

In your table of words created from the text you iterated through, do you want to store both the root and the variant in two separate fields? Or would it be better if there was only one field containing either the root, if it exists, or the original word.

February 1, 200917 yr

except I use an intermediate field to hold the root temporarily.

A calculation field that references related/global fields is forced to unstored, and cannot be used as a matchfield on the "other" side of a relationship.