Mode, mathematically speaking.

Stomp · August 7, 2008

Sorry if this has been covered, but "mode" isn't the easiest thing to search for.

Has anyone figured out a clever way to find the mode of related values, or a value list, in the context of a calculation? And by mode I mean Mode, like Mode, Median, Mean, etc. Most common value.

I smell a custom function, but I'm somewhat baffled that FM doesn't include this function. Or am I blind?

comment · August 7, 2008

Yes, I believe it requires a recursive calculation, hence a custom function. There's one here:

http://www.fmforums.com/forum/showtopic.php?tid/168216/

though I'm not sure why it requires the list to be sorted - I might take a look at it later.

Raybaudi · August 8, 2008

This seems to work...

/*

Mode ( ValueList ) custom function

First Try... ( maybe buggy and sure can be nicer )

Returns a string with two words:

first word is the number of times that a value is into the value list

second word is the value

If more than a value has the same number of presence into the list, new rows are added.

Values can be numbers or strings

*/

Let([

adj = ¶ & Substitute ( ValueList ; ¶ ; "¶¶" ) & ¶;

first = ¶ & GetValue ( ValueList ; 1) & ¶;

count = PatternCount ( adj ; first );

next = Substitute ( TrimAll ( Substitute ( Substitute ( adj ; first ; ¶ ) ; [ " " ; "§" ] ; [ ¶ ; " " ]) ; 1 ; 1 ) ; [ " " ; ¶ ] ; [ "§" ; " " ] )

];

Case(

ValueCount ( ValueList );

Case(

LeftWords ( Mode ( next ) ; 1 ) > count ; Mode ( next ) ;

LeftWords ( Mode ( next ) ; 1 ) = count ; count & " " & GetValue ( ValueList ; 1) & ¶ & Mode ( next ) ;

count & " " & GetValue ( ValueList ; 1)

)

Raybaudi · August 8, 2008

Bah...

If too much text, it slows down until stop working with error...

... and there is a wrong test.

Very buggy !

( for comment ) Maybe I'll need the $

comment · August 8, 2008

It could be done like this, I think:

Mode ( listOfValues ; result ; freq )

Returns the most frequently occurring value/s in listOfValues, along with the number of occurrences.


Let ( [

countValues = ValueCount ( listOfValues ) ; 

this = GetValue ( listOfValues ; 1 ) ;

n = ValueCount ( FilterValues ( listOfValues ; this ) ) ; 



nextResult = Case ( 

n < freq ; result ;

n = freq ; result & ¶ & this ;

this ) ;



nextFreq = Max ( n ; freq ) ;



xcl = Substitute ( ¶ & listOfValues & ¶ ; [ ¶ ; "¶¶" ] ; [ ¶ & this & ¶ ; "" ] ; [ "¶¶" ; ¶ ] ) ;

nextList = Middle ( xcl ; 2 ; Length ( xcl ) - 2 )

] ; 

Case ( 

countValues ; Mode ( nextList ; nextResult ; nextFreq ) ;



freq & " x " & Substitute ( result ; ¶ ; ¶ & freq & " x " )

)

)

The result and freq parameters should be empty on initial call.

Examples:

Mode ( "10¶11¶12¶11" ; "" ; "" ) = "2 x 11" ;

Mode ( "a¶b¶c¶a¶c" ; "" ; "" ) = "2 x a¶2 x c" ;

Raybaudi · August 8, 2008

Hi comment

there is something wrong with your CF, too.

First of all, if the List is empty, it gives: " x "...

but real problems comes with a large text.

Try to copy a generic text and to put it into a field that autoenters: Substitute ( generic text ; " " ; ¶ )

I triend with:

"Mode ( ValueList ) custom function

First Try... ( maybe buggy and sure can be nicer )

Returns a string with two words:

first word is the number of times that a value is into the value list

second word is the value

If more than a value has the same number of presence into the list, new rows are added.

Values can be numbers or strings"

... and the result was:

2 x (

2 x )

BTW: isn't this a case to use $variables ?

comment · August 8, 2008

I didn't bother with input validation. If your list is empty, or contains empty values, you will get unexpected results. It's not a matter of size - the function is tail-recursive, so it should handle lists of up to 50,000 DISTINCT values (the actual list could be much larger).

Stomp · August 8, 2008

Lovely. Thanks, comment. Figuring out how that function works made my brain literally burst forth from my ears.

Oldfogey · August 10, 2008

I played around with this and came up with a similar CF but I had few 'wonders'. Would a sort followed by a scan be quicker? Would it be an idea to include a check for the (latest) value count being greater than the count of the remaining values? I think I'd be trying to find out a bit more about the data characteristics if possible, because efficiency depends so much on these.

Finally, I wondered where the values came from in the first place. Presumably, they are in individual records? In this case, a sort with summary counts should provide the required values with very little trouble. Even better(?) a self-join on Value'.

Just curious but too lazy to satisfy my own curiosity.

comment · August 10, 2008

I don't know. I am more lazy than you are, and what's more - I think some considerations are more important than raw speed (unless you need to do this often, on large sets). A script would require adding a summary count field, if one doesn't exists. A script needs to be initiated. A script would require the value field to be indexed.

I believe the function is quite efficient (one iteration per distinct value). Adding a check for freq > remaining values is a double-edged sword: it would save evaluations in those cases where one or two values stand out significantly above the rest, but it would also add an evaluation at each iteration - most likely unnecessary with more typical data.

Sign In

Mode, mathematically speaking.

Recommended Posts

Stomp

comment

Raybaudi

Raybaudi

comment

Raybaudi

comment

Stomp

Oldfogey

comment

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information