jrRaid Posted February 29, 2008 Posted February 29, 2008 Out of a text field I need several results: 1. Total count of words (wordCount function - no problem) 2. Total unique words 3. Number of sentences 4. Average words per sentence 5. Break down of word lenght (how many 1 letter words, 2 letter words, 3 letter words etc.) Any hints how to start doing this ? TIA
comment Posted February 29, 2008 Posted February 29, 2008 Before you can count sentences, you will have to define "sentence" in terms a computer can 'understand'. You could start by counting sentence-ending punctuation marks, i.e. period, question mark and exclamation mark. But that assumes the text is correctly punctuated, and even then is far from perfect, for example this: "An increase of 10.5% percent(!) was noted." will count as 3 sentences. Perhaps it could be improved by looking for a sentence-ending punctuation mark followed by a space - but again, this makes an assumption that the text is so formatted. Regarding 2 and 5, you will need a custom function, or a looping script, to go over the text word by word.
Raybaudi Posted February 29, 2008 Posted February 29, 2008 3. Number of sentences ... or better you can count the number of paragraph with a calculation like this: ValueCount ( Substitute ( TrimAll ( Substitute ( yourText ; [" " ; "§§§"] ; [ ¶ ; " "] ) ; 1 ; 1 ) ; [ " " ; ¶] ; ["§§§" ; " "] ) )
jrRaid Posted February 29, 2008 Author Posted February 29, 2008 Thanks Comment. I was already looking for a way to 'normalize' the sentence endings. It's not only a period followed by a space, it could be a period followed by a ¶ or a list of other possibilities. I need something to start with and narrow down along the way. Any idea about the needed CFs, what are the key words to search on ?
comment Posted February 29, 2008 Posted February 29, 2008 I would start with something like: PatternCount ( Substitute ( text & ¶ ; [ "Mr." ; "" ] ; [ "i.e." ; "" ] ; [ "e.g." ; "" ] ; // AND LOTS OF MORE EXCEPTIONS TO FOLLOW [ ". " ; "§" ] ; [ "! " ; "§" ] ; [ "? " ; "§" ] ; [ ".¶" ; "§" ] ; [ "!¶" ; "§" ] ; [ "?¶" ; "§" ] ) ; "§" ) Any idea about the needed CFs, what are the key words to search on ? I think you need a CUSTOM function for this.
Lee Smith Posted February 29, 2008 Posted February 29, 2008 (edited) Any idea about the needed CFs, what are the key words to search on ? Most of the CFs are located Click Here to go to Brian Dunning web site. I don't recall any one CF that would do this, but you might find a couple that you could combine to do this. Using the start that comment has provided, you might want to just write your own. HTH Lee Edited February 29, 2008 by Guest
Recommended Posts
This topic is 6110 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now