Calculating on two dimensions

AndrewPurvis · May 8, 2016

I have a database (admittedly a frivolous one) that tracks my rating (1-5 stars) for Magic decks I have built. It also tracks how decks have performed. In one table, I have created threshold values for five groups, based on games won of those played: those ≤20%, those >20% and ≤40%, those >40% and ≤60%, those >60% and ≤80%, those >80%. I then want to look at the degree of correlation between these two values (star rating and winning percent), which I do based on a minimum number of games played.

First of all, what I have works fine, even when I drop my threshold to one game played and bring up 378 related records. The time to get everything pulled and calculated (the size of each star rating's population, the average number of games played for each population, their average winning percent, the StdDevP of each population as scaled winning percent relates to star rating, the number in each population that falls into each of the five groupings of winning percentages, and the percentage of each group by winning percent), as well as charting that last part is perhaps 1 second, or a hair over. However, I would like to speed this up and simplify the relationship graph as it relates here. This is how it looks now; I build my interface for just my own use, so it doesn't have to be particularly well labeled.

So far, everything is being done with calculation fields, and I am wondering if I am doing things the wrong way. Below is the portion of my relationships graph that relates to these two tables. Global fields in the Aggregator table hold values of 1 through 5. This and the threshold minimum number of games played form the relationships. Each of those relates to one of the TOs that has the data about the percentage of games won by each deck and the star rating of that deck. Each deck has its winning percent checked by five fields, returning a 1 if the value is true for the range conditions set (say, >.4 and ≤.6). Those fields then display across each row in the center grid above. Each row is a relationship, and each column is the same field as it is viewed by the different relationships.

Screen Shot 2016-05-08 at 4.03.21 PM.png

I figure there has GOT to be a smarter way of doing this, but given that this beast has evolved over a period of nearly a decade, I am still finding other efficiencies on old stuff. This part is relatively new, so I am almost certainly doing it poorly.

Edited May 8, 2016 by AndrewPurvis

jbante · May 9, 2016

I have some differences of opinion on how your data might be analyzed — e.g. using raw win rate (maybe with Laplacian adjustment) instead of binned win rate, and StdDev instead of StdDevP — but if everything takes about a second to evaluate, you're doing fine.

If you're really looking for something to do along those lines, though, applications working with more data see a lot of benefit from using stored fields and setting them with scripts. This doesn't make the calculations run faster, but it gives you more control about when they run.

Another thing to pursue might be to look for "stream" or "online" versions of your calculations where you can update a summary statistic as new data come in one-at-a-time rather than having to re-process all the historical data every time you want to update a statistic. Average, standard deviation, and linear correlation can be computed exactly this way. With other statistics you may have to settle for an approximation to do a stream calculation. Histograms and quantiles on a stream are only approximate, and the kinds of correlation appropriate for comparing star ratings with win rates probably also would have to be approximate.

AndrewPurvis · May 10, 2016

6 hours ago, jbante said:

I have some differences of opinion on how your data might be analyzed — e.g. using raw win rate (maybe with Laplacian adjustment) instead of binned win rate, and StdDev instead of StdDevP — but if everything takes about a second to evaluate, you're doing fine.

If you're really looking for something to do along those lines, though, applications working with more data see a lot of benefit from using stored fields and setting them with scripts. This doesn't make the calculations run faster, but it gives you more control about when they run.

Another thing to pursue might be to look for "stream" or "online" versions of your calculations where you can update a summary statistic as new data come in one-at-a-time rather than having to re-process all the historical data every time you want to update a statistic. Average, standard deviation, and linear correlation can be computed exactly this way. With other statistics you may have to settle for an approximation to do a stream calculation. Histograms and quantiles on a stream are only approximate, and the kinds of correlation appropriate for comparing star ratings with win rates probably also would have to be approximate.

I had initially used StdDev rather than StdDevP, before checking with one far more knowledgeable than I. His thinking was that it was a complete population in the sense that they represented the full number meeting the requirements of minimum games played and star rating, rather than a sampling of that population. As to things Laplacian, I would be at a loss (my degrees are in English).

The approximation of the correlation between expected performance and actual is achieved based on the idea that 5-star decks might be expected to hover around 90%, with a 20% drop off by star. I achieve this by taking the winning percentage and multiplying by 4, then adding 1 to the product. The result is anything from 1 ((0*4)+1) to 5 ((1*4)+1). I average the difference between the scaled percent and the star rating, then take the StdDevP of that to determine how closely grouped the figures are. The value of that is largely as a measure of how accurately the rating system reflects reality, as any number greater than 1 suggests I may have something off, a very small sample size, or both.

All that said (and probably too much off of the topic of databases for this forum), you have provided me a great deal of food for thought as to how I am producing this, and I will have to digest it and act accordingly before I put an executable out there for people toy around with.

jbante · May 11, 2016

My thinking for StdDev over StdDevP is my presumption that you're tracking past performance of decks with some interest in their future performance. If you're using past performances to judge how well decks might perform in the future, then the record of past performances (even a complete record) is a sample of all the performances the decks will ever have. However, if your interest is strictly historical, your friend is right to suggest StdDevP.

The Laplacian adjustment has the same motivation as StdDev vs. StdDevP. Accepting that the data we have may not be 100% representative of a whole population being sampled, the Laplacian adjustment would calculate win rates as if there were two more games, one win and one loss, than are actually on record. It nudges everything towards the middle a little bit; the more data you have, and therefore the more confident you should be in the win rate from your data, the smaller the nudge is. The Laplacian adjustment would calculate the win rate as ( numberOfWins + 1 ) / ( numberOfGames + 2 ) instead of numberOfWins / numberOfGames. But again if you think of your data as the whole population instead of the sample, there's no reason to do this.

For the correlation, I would have suggested that you look at Kendall's tau. It's the most classically appropriate correlation from statistics for your star rating vs. win rate relationship. (It's also one of the statistics that can't be computed exactly on a stream.) However, applied to your data, it wouldn't check for things like "5 stars means 90% or better", only that if the win rate is better, the star rating is also better, and vice versa. It sounds like you're looking for a goodness-of-fit calculation rather than a correlation. I'm a little confused by your description, though. The mean difference between actual and expected star ratings is by itself a reasonable measure of the star rating system's fit to your expectations. Adding a StdDev(P) to the mix strikes me as odd, though. What exactly are you taking the StdDev(P) of? The average difference between actual and expected star ratings should just be one number; taking the StdDev(P) of that would not be meaningful. Taking the StdDev(P) of the differences between actual and expected star ratings isn't as obvious a goodness-of-fit measure as the mean of those differences, but I suppose it could tell you whether the deviations are consistent across all decks or more due to some problematic subset.

Does your win-loss-star rating system do anything to account for the strength of the opposing deck in each match? A 5-star deck might be expected to win about 90% of matches against any random deck, but maybe only 50% against other 5-star decks.

Edited May 11, 2016 by jbante

AndrewPurvis · May 11, 2016

I see your point about StdDevP vs StdDev here: only when a population is fixed, including time, can I count it as the entire population. I had always considered that a purely backward look at the totality of data available represented the population.

I am, indeed, using past performance as an approximation of probable future performance. Thus far, though admittedly with only a modest sample since developing the system (32 total, played by three 5-star and one 4-star, getting an aggregate 25-7, with games leaning slightly to favor 5-star decks) has been almost exactly on target for predictions, facing a variety of opposing decks, some of them multiple times.

I am hesitant to use the Laplacian adjustment, as you describe it here. Due to the large reserve I have of decks that have never been used—58 rated 4 stars and another 12 rated 5 stars, in addition to a further 128 with lower ratings, though they will not see service unless I feel like throwing one in for further testing of predictions—most will not have a great deal of time in service. They will see anywhere from 4 to 8 games, producing a significant shift if I use the Laplacian adjustment. I am instead making predictions for individual decks based upon what I have from star ratings, and tracking those, as well as adding them to the pool of records being evaluated. Significant errors in individual cases will show up immediately and can be evaluated from there, as the conditions of each game are also tracked (6 recent games excepted).

The nature of the star rating system is also relevant here. It is based on my interest in having FMP shuffle up the deck and deal out new starts that I can then play out. If I find something more intriguing or compelling, I give it a higher rating. If I find it more frustrating, I give it a lower rating. In this sense it is not based on any objective foundation, but once I found a high correlation, I was intrigued enough to look at how it might work moving forward. It does not assume anything about the opposition other than the deck across the table will fall within the range of capabilities represented by the more than 2,000 games in the pool played by those decks (sometimes against one another). Obviously, two 5-star decks would be expected to run about 50% against one another, given a large enough sample, barring specific strength-weakness match-ups that might skew the results toward one or the other. Thus, it assumes a fairly normal distribution of opposing decks. However, given that, I would still expect better performance from a 5-star deck against any random opposing deck than I would from, say, a 3-star deck, making it a better bet to win over time.

I had actually encountered Kendall tau calculations in my research into what might be useful in this use case, but I had not run my data through the formula. Based on what you have provided here, I would need to use Tau-b, as there are significant numbers of ties in the rankings on both sides. My expectation is not actually 90% or better, but rather a mean of 90% (in the range of 80% to 100%) for 5-star, and so on. My reason for taking the difference between the scaled percent and the star rating is to see how far the two differ in individual cases, to which I apply, as a population, StdDevP (currently) to see how closely they correlate. My thinking is that there will be those that fall outside of the expected ranges, and the standard deviation would give me a sense of how closely to the expected distribution these values came out, where a value of <1 would show a distribution clustered closer around the mean, and one >1 would indicate a population that was not as accurately reflective in performance of the expectations (as expressed by star rating). It seems, however, that the Kendall tau might be more appropriate here.

Ultimately, the star rating is a measure of how much I drool over the thought of going after an opponent with the deck. If I dread pulling it out, that will a ) be less fun for me and b ) likely be less successful with regards to performance. (Note I am not concerning myself with the likelihood my opponent will enjoy the experience here, as I prefer 4- and 5-star decks to those of lesser ratings.) In its first incarnation, back in November of 2006, this database was envisioned as a means of improving my deck design habits and those of a friend of mine, as he was my primary opponent (and indeed, the only one tracked here at this time, skewing the data somewhat). That said, my ultimate goal is to provide a free tool others can download and use to design decks and track their performance, ideally seeing what concepts work better or less well and leading to improvements. With that in mind, your feedback on all of this, both at the design level and the statistical level, is fantastic, and I greatly appreciate it.

Sign In

Calculating on two dimensions

Recommended Posts

AndrewPurvis

jbante

AndrewPurvis

jbante

AndrewPurvis

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information