Straightforward Dice ProbabilityJanuary 20, 2011 11:04 AM   Subscribe

Please refresh my memory with regard to a straightforward question of fair dice and probability.

I'm a little embarrassed to be asking this question... for uh a friend... because my friend should be able to do this easily and until a couple of years ago I know he could. He's trying to remember the method for examining the probability of certain outcomes when presented with a set of results compared to probability distribution.

We obviously know the mean, variance, etc of rolling 2d6. If you roll 2d6 n times and the result has a mean of y, it should be easy to calculate the probability of the obtained result occurring by chance. Trivial example: Two fair six sided dice are rolled 100 times and a mean of 7.5 is the result. What are the odds of a result that far or more off the mean. Assume we know the sum of the two dice but not the individual die results. i.e. we know how many times 7 was rolled but not whether those rolls were 6+1 or 5+2 or 4+3.

Do we use the central limit thingamabob? Or what?

To be clear I'm looking for a general case for N rolls of 2d6 which result in a mean of Y so that I can examine as many sets of rolls as I care to with variable numbers of rolls and means. Uh, so my friend can do that. Easy, right? We know the number of times we rolled 2d6. We know the mean of the probability distribution for rolling 2d6. We know the mean of our results. It should be simple to come up with something like "We would expect a result this far off the mean approximately 1 time in 100" or whatever. Please hope me.
posted by Justinian to Science & Nature (14 answers total) 5 users marked this as a favorite

Standard Deviation?
posted by empath at 11:26 AM on January 20, 2011

The mean result of 2 six-sided dice is 7, not 7.5.

The question you're asking is about statistics, not about probability.
posted by Chocolate Pickle at 11:31 AM on January 20, 2011

In the case you asked about: if you roll a die once, the mean is (1+2+3+4+5+6)/6 = 7/2. The mean of the square is (1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2)/6 = 91/6,. The variance is the mean of the square minus the square of the mean, or (91/6)-(7/2)^2 = 35/12, and the standard deviation is sqrt(35/12).

So if you roll two hundred dice (you say it's one hundred 2d6, but of course that's the same thing) the mean of the sum of the two hundred dice is (7/2)*200 = 700, and the standard deviation is sqrt(35/12) * sqrt(200), which is approximately 24. Furthermore, the distribution of that sum is approximately normal, with that mean and standard deviation; this is the central limit theorem.

So you want the probability that a normally distributed random variable is at least (750-700)/24 = 2.1 standard deviations from the mean; from a standard normal table this is about one percent. (I've left out some details here because they're kind of annoying to write up; hopefully this is enough to refresh your friend's memory.)

The same procedure will of course work with different numbers. I find it easier to think about the sum of the dice rolls instead of the average.
posted by madcaptenor at 11:35 AM on January 20, 2011 [1 favorite]

Take a look at the Chi-squared test.
posted by Blazecock Pileon at 11:37 AM on January 20, 2011 [1 favorite]

Yeah, I know what a standard deviation is. It's the square root of the variance. For a single roll of two dice I believe the variance is like 5.8 and sigma is 2.4. But I don't know the standard deviation for X number of rolls.

That's what my question is. The standard deviation, more or less.
posted by Justinian at 11:39 AM on January 20, 2011

The mean result of 2 six-sided dice is 7, not 7.5. The question you're asking is about statistics, not about probability.

I didn't say the mean result of 2 six sided dice is 7.5, I said (for example) the mean result of 100 rolls of 2 six sided dice could be 7.5. Calculating the odds of that happening is what my question is. That's probability, not statistics.

(preview)

madcaptenor: Are you certain you can treat it as the sum of the individual dice rolls? Something is nagging at me but I can't quite pinpoint it. Wait, I think I sort of have it... the variance is not the same for rolling 1 die a bunch of times as it is for rolling 2 dice a bunch of times. I don't think you can treat the dice individually.
posted by Justinian at 11:45 AM on January 20, 2011

Aha! Chi-squared! I believe that's it, thanks Blazecock. I couldn't remember the name of the test I was looking for. I'll mark you as best answer after I make sure that's the one.
posted by Justinian at 11:47 AM on January 20, 2011

I apologize for another comment in a row but it doesn't look like the χ2 test is quite what I'm looking for. χ2 would let me test a hypothesis like "these results were obtained with fair dice" for various levels of significance. Which is kind of what I want but not exactly.

To make it work I'd have to keep applying the test over and over until I found the first level at which the test failed. And if the results are a 1 in a million chance I could have to run through it a whole bunch of times and even then it would be approximate. That's sort of the Newton's Method for what I'm actually looking for; keep iterating until you get close enough.

I know there's a way to calculate the answer directly.
posted by Justinian at 11:58 AM on January 20, 2011

I'm saying that if you just care about the sum, rolling 1 die 200 times is the same as rolling 2 dice 100 times.
posted by madcaptenor at 12:25 PM on January 20, 2011

You may also be thinking of Chebyshev's inequality.
posted by mhoye at 12:40 PM on January 20, 2011

What you want to do is go from the distribution of one trial (known) to the distribution of the sum. For simplicity's sake, I'll reformulate your example: given n=200 fair die rolls X1 ... Xn, what's the probability that the mean > 3.75?

Each roll has a discrete uniform distribution with mean 7/2 and variance 35/12. If we add together 200 independent trials, the sum has mean 700 and variance 1750/3. Then = ΣX/n with mean 7/2 and variance 7/480. (The mean scales proportionally to 1/n, while the variance is proportional to 1/n2.)

At this point, the central limit theorem lets you wave your hands and say that, with a sufficiently large number of trials, the sum will be roughly normally distributed. Quantifying this is tricky, but 100 trials should be plenty. So we plug these parameters into the normal cumulative distribution function, and get P( < 3.75) = 0.980783, or about a 1.9% chance of being above the threshold. If you want to also include the probability that the mean is less than 3.25, double that to 3.8%.

Unless I've made a mistake somewhere, which is certainly possible.

Check out Introduction to Probability for a good reference on this sort of stuff.
posted by teraflop at 12:49 PM on January 20, 2011 [1 favorite]

Yes, you're right madcaptenor. Something is still nagging at me conceptually about breaking down the results into a series of single die rolls when I don't actually know what the single rolls were, only the sum of each pair of dice but I can't identify any reason for that feeling. So I'm running with it.
posted by Justinian at 12:51 PM on January 20, 2011

That's exactly it, going from the distribution of a known trial to the distribution of the sum.
posted by Justinian at 12:52 PM on January 20, 2011

I think everyone above has answered the question correctly as far as the math goes, but it might help you to look at the math in the following way:
Suppose 1d6_1 and 1d6_2 are the values of the 2 dice you roll in a single trial to get the value of 2d6 (the subscripts indicate the two dice rolled), i.e.
2d6 = 1d6_1 + 1d6_2
Probability theory says that since 1d6_1 and 1d6_2 are independent,
Mean(2d6)=Mean(1d6_1) + Mean(1d6_2) = 7/2+7/2 = 7
Variance(2d6)=Variance(1d6_1) + Variance(1d6_2) = 35/12 + 35/12 = 35/6

Now what you want is the distribution of X=(2d6_1+2d6_2+...+2d6_N)/N . The central limit theorem applies here and says that mean(X)=mean(2d6) and Var(X) = Var(2d6)/N=35/(6*N).
And that X is Normal, with mean and variance as calculated above. If you now see an average of Y, you can compute how many std deviations it is from the mean, using mean(X) and Var(X). Then you can use the cumulative probability distribution table for the normal distribution to find the probability of that happening.
posted by learning_machines at 10:02 PM on January 20, 2011

« Older What to take to family waiting at hospital   |   I cannot find the FTP Migration tool Newer »
This thread is closed to new comments.