February 29, 200817 yr Out of a text field I need several results: 1. Total count of words (wordCount function - no problem) 2. Total unique words 3. Number of sentences 4. Average words per sentence 5. Break down of word lenght (how many 1 letter words, 2 letter words, 3 letter words etc.) Any hints how to start doing this ? TIA
February 29, 200817 yr Before you can count sentences, you will have to define "sentence" in terms a computer can 'understand'. You could start by counting sentence-ending punctuation marks, i.e. period, question mark and exclamation mark. But that assumes the text is correctly punctuated, and even then is far from perfect, for example this: "An increase of 10.5% percent(!) was noted." will count as 3 sentences. Perhaps it could be improved by looking for a sentence-ending punctuation mark followed by a space - but again, this makes an assumption that the text is so formatted. Regarding 2 and 5, you will need a custom function, or a looping script, to go over the text word by word.
February 29, 200817 yr 3. Number of sentences ... or better you can count the number of paragraph with a calculation like this: ValueCount ( Substitute ( TrimAll ( Substitute ( yourText ; [" " ; "§§§"] ; [ ¶ ; " "] ) ; 1 ; 1 ) ; [ " " ; ¶] ; ["§§§" ; " "] ) )
February 29, 200817 yr Author Thanks Comment. I was already looking for a way to 'normalize' the sentence endings. It's not only a period followed by a space, it could be a period followed by a ¶ or a list of other possibilities. I need something to start with and narrow down along the way. Any idea about the needed CFs, what are the key words to search on ?
February 29, 200817 yr I would start with something like: PatternCount ( Substitute ( text & ¶ ; [ "Mr." ; "" ] ; [ "i.e." ; "" ] ; [ "e.g." ; "" ] ; // AND LOTS OF MORE EXCEPTIONS TO FOLLOW [ ". " ; "§" ] ; [ "! " ; "§" ] ; [ "? " ; "§" ] ; [ ".¶" ; "§" ] ; [ "!¶" ; "§" ] ; [ "?¶" ; "§" ] ) ; "§" ) Any idea about the needed CFs, what are the key words to search on ? I think you need a CUSTOM function for this.
February 29, 200817 yr Any idea about the needed CFs, what are the key words to search on ? Most of the CFs are located Click Here to go to Brian Dunning web site. I don't recall any one CF that would do this, but you might find a couple that you could combine to do this. Using the start that comment has provided, you might want to just write your own. HTH Lee Edited February 29, 200817 yr by Guest
Create an account or sign in to comment