Stomp Posted August 7, 2008 Posted August 7, 2008 Sorry if this has been covered, but "mode" isn't the easiest thing to search for. Has anyone figured out a clever way to find the mode of related values, or a value list, in the context of a calculation? And by mode I mean Mode, like Mode, Median, Mean, etc. Most common value. I smell a custom function, but I'm somewhat baffled that FM doesn't include this function. Or am I blind?
comment Posted August 7, 2008 Posted August 7, 2008 Yes, I believe it requires a recursive calculation, hence a custom function. There's one here: http://www.fmforums.com/forum/showtopic.php?tid/168216/ though I'm not sure why it requires the list to be sorted - I might take a look at it later.
Raybaudi Posted August 8, 2008 Posted August 8, 2008 This seems to work... /* Mode ( ValueList ) custom function First Try... ( maybe buggy and sure can be nicer ) Returns a string with two words: first word is the number of times that a value is into the value list second word is the value If more than a value has the same number of presence into the list, new rows are added. Values can be numbers or strings */ Let([ adj = ¶ & Substitute ( ValueList ; ¶ ; "¶¶" ) & ¶; first = ¶ & GetValue ( ValueList ; 1) & ¶; count = PatternCount ( adj ; first ); next = Substitute ( TrimAll ( Substitute ( Substitute ( adj ; first ; ¶ ) ; [ " " ; "§" ] ; [ ¶ ; " " ]) ; 1 ; 1 ) ; [ " " ; ¶ ] ; [ "§" ; " " ] ) ]; Case( ValueCount ( ValueList ); Case( LeftWords ( Mode ( next ) ; 1 ) > count ; Mode ( next ) ; LeftWords ( Mode ( next ) ; 1 ) = count ; count & " " & GetValue ( ValueList ; 1) & ¶ & Mode ( next ) ; count & " " & GetValue ( ValueList ; 1) ) ) )
Raybaudi Posted August 8, 2008 Posted August 8, 2008 Bah... If too much text, it slows down until stop working with error... ... and there is a wrong test. Very buggy ! ( for comment ) Maybe I'll need the $
comment Posted August 8, 2008 Posted August 8, 2008 It could be done like this, I think: Mode ( listOfValues ; result ; freq ) Returns the most frequently occurring value/s in listOfValues, along with the number of occurrences. Let ( [ countValues = ValueCount ( listOfValues ) ; this = GetValue ( listOfValues ; 1 ) ; n = ValueCount ( FilterValues ( listOfValues ; this ) ) ; nextResult = Case ( n < freq ; result ; n = freq ; result & ¶ & this ; this ) ; nextFreq = Max ( n ; freq ) ; xcl = Substitute ( ¶ & listOfValues & ¶ ; [ ¶ ; "¶¶" ] ; [ ¶ & this & ¶ ; "" ] ; [ "¶¶" ; ¶ ] ) ; nextList = Middle ( xcl ; 2 ; Length ( xcl ) - 2 ) ] ; Case ( countValues ; Mode ( nextList ; nextResult ; nextFreq ) ; freq & " x " & Substitute ( result ; ¶ ; ¶ & freq & " x " ) ) ) The result and freq parameters should be empty on initial call. Examples: Mode ( "10¶11¶12¶11" ; "" ; "" ) = "2 x 11" ; Mode ( "a¶b¶c¶a¶c" ; "" ; "" ) = "2 x a¶2 x c" ;
Raybaudi Posted August 8, 2008 Posted August 8, 2008 Hi comment there is something wrong with your CF, too. First of all, if the List is empty, it gives: " x "... but real problems comes with a large text. Try to copy a generic text and to put it into a field that autoenters: Substitute ( generic text ; " " ; ¶ ) I triend with: "Mode ( ValueList ) custom function First Try... ( maybe buggy and sure can be nicer ) Returns a string with two words: first word is the number of times that a value is into the value list second word is the value If more than a value has the same number of presence into the list, new rows are added. Values can be numbers or strings" ... and the result was: 2 x ( 2 x ) BTW: isn't this a case to use $variables ?
comment Posted August 8, 2008 Posted August 8, 2008 I didn't bother with input validation. If your list is empty, or contains empty values, you will get unexpected results. It's not a matter of size - the function is tail-recursive, so it should handle lists of up to 50,000 DISTINCT values (the actual list could be much larger).
Stomp Posted August 8, 2008 Author Posted August 8, 2008 Lovely. Thanks, comment. Figuring out how that function works made my brain literally burst forth from my ears.
Oldfogey Posted August 10, 2008 Posted August 10, 2008 I played around with this and came up with a similar CF but I had few 'wonders'. Would a sort followed by a scan be quicker? Would it be an idea to include a check for the (latest) value count being greater than the count of the remaining values? I think I'd be trying to find out a bit more about the data characteristics if possible, because efficiency depends so much on these. Finally, I wondered where the values came from in the first place. Presumably, they are in individual records? In this case, a sort with summary counts should provide the required values with very little trouble. Even better(?) a self-join on Value'. Just curious but too lazy to satisfy my own curiosity.
comment Posted August 10, 2008 Posted August 10, 2008 I don't know. I am more lazy than you are, and what's more - I think some considerations are more important than raw speed (unless you need to do this often, on large sets). A script would require adding a summary count field, if one doesn't exists. A script needs to be initiated. A script would require the value field to be indexed. I believe the function is quite efficient (one iteration per distinct value). Adding a check for freq > remaining values is a double-edged sword: it would save evaluations in those cases where one or two values stand out significantly above the rest, but it would also add an evaluation at each iteration - most likely unnecessary with more typical data.
Recommended Posts
This topic is 6009 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now