How do I calculate the probability of a specific sum of repeated die rolls?
September 28, 2012 10:16 PM   Subscribe

I'm looking to learn how to calculate probabilities for a multi-round dice game. I've researched this question some, and it looks like I might need to know how to use the multinomial distribution, but I can't find any good introductions. Please point me to the most layman-accessible educational material on this subject, and help me to help myself.

The game I'm playing can be abbreviated like this:

I roll two six sided dice and add them.
I then subtract a (fixed) penalty value, and score as many points as remain.
I cannot score less than 0. If I would score less than 0 points, I score 0 instead.
I repeat this process several times, without altering the penalty between rolls.

How do I predict the probability of any 1 particular cumulative score?
I am not interested in simulating this event and taking a random sample; I want to calculate all possible outcomes and their precise probabilities. Examples follow:


Example #1:
My penalty value is 0, and I repeat the die roll 36 times.
Each roll will produce a number between 2 and 12 inclusive, and my final score will be between 72 and 432. What is the chance that I will score 7 points exactly 6 times.

Example #2:
My penalty value is 11, and I repeat the procedure 40 times.
Each roll will produce a value between 0 and 1 inclusive, and my final total score will be between 0 and 40, and likely closer to 0 than 40.
I wish to predict the chance that I will score a cumulative total of exactly 2 points across the sum of all 40 trials.

Example #3:
My penalty value is -5, and I repeat the procedure 5 times.
Each roll will produce a value between 7 and 17 inclusive (negative penalty). I wish to plot the probability of a arriving at each possible total: 35 through 85.

Again, I think maybe this is a case of the muiltinomial distribution, but everything I can find about it (wikipedia) is confusing and not written as educational but as reference material.

I'm familiar with MS Excel and have a basic Liberal Arts understanding of calculus and probability/statistics, though I'm quite rusty. I'm willing to re-learn what I have forgotten, and also to learn new skills - within reason. If this procedure requires me to learn lots of new math, I would appreciate links or book titles to pursue as well as a general outline of what the fields of study are called, and what I'll need to know to pursue them intelligently (i.e. help me ask questions without sounding ignorant/clueless).

I gather that the program R might be helpful in this, so please feel free to recommend a good introduction to both R and Statistics.

Thanks for any help you're able to give!
posted by Richard Daly to Education (6 answers total) 4 users marked this as a favorite
 
Example #1:
My penalty value is 0, and I repeat the die roll 36 times.
Each roll will produce a number between 2 and 12 inclusive, and my final score will be between 72 and 432. What is the chance that I will score 7 points exactly 6 times.


Ugh, dice.

The probability of getting a sum of 7 on two dice is the probability of getting any of (1,6), (2,5), (3,4), (4,3), (5,2), or (6,1). So, 6 out of 36 combinations of two dice lead to a sum of 7, for a probability of 1/6.

You want to score 7 (each time at 1/6 probability) exactly 6 times and not 7 (each time at 5/6 probability) the remaining 30 times.

How many ways of arranging 6 of one thing and 30 of another thing are there? It's 36!/(6! * 30!) = 34*33*31*8*7 ways, which is a rather big number.

So the answer is the probability of getting a sum score of 7 six times, multiplied by the probability of getting another score 30 times, multiplied by the number of ways this combination of events can occur:

(1/6)^6 * (5/6)^30 * (34*33*31*8*7) ≈ 0.176

A program like R will come in handy for calculating these large numbers.
posted by Nomyte at 11:15 PM on September 28, 2012


Example #2:
My penalty value is 11, and I repeat the procedure 40 times.
Each roll will produce a value between 0 and 1 inclusive, and my final total score will be between 0 and 40, and likely closer to 0 than 40.
I wish to predict the chance that I will score a cumulative total of exactly 2 points across the sum of all 40 trials.


Why am I doing this?

Anyway, the probability of getting a score of 1 in this case is the same as getting (6,6) on two dice, or 1/36. The probability of getting a score of 0 is 35/36.

You want two instances of the 1/36-chance event and thirty-eight instances of the 35/36-chance event.

Again, how many ways are there of arranging 2 of one thing and 38 of another? It's 40!/(2! * 38!).

So your answer in this case is (1/36)^2 * (35/36)^38 * 40!/(2! * 38!) ≈ 0.206.
posted by Nomyte at 11:29 PM on September 28, 2012


Example #3 is left as an exercise for the reader.
posted by Nomyte at 11:29 PM on September 28, 2012


Nomyte has answered already, and it's a fine answer that I don't disagree with. But I worked all of the following out already, and am posting in the spirit of "sometimes it takes two different explanations before understanding sets in". Apologies if I end up over explaining some of the steps. In retrospect, I hope this isn't homework.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Okay. All of your examples are essentially the same calculation, with different values and number of repetitions.

Your problem is indeed a multinomial distribution problem. There's a fairly accessible description/example of this formula on this page.

The multinomial formula is: P = [N!/(n1!*n2!...nk!)] * [(p1^n1)*(p2^n2)...*(pk^nk)], as you can see from that web page. You just have to figure out what n's and p's to plug in for each of your questions.

Your standard trial is going to be a roll of two dice. You know that the result will be between 2 and 12, with the following probability:
	n1 = 2 === 1/36 for p1
	n2 = 3 === 2/36 for p2
	n3 = 4 === 3/36 for p3
	n4 = 5 === 4/36 for p4
	n5 = 6 === 5/36 for p5
	n6 = 7 === 6/36 for p6
	n7 = 8 === 5/36 for p7
	n8 = 9 === 4/36 for p8
	n9 = 10=== 3/36 for p9
	n10 =11=== 2/36 for p10
	n11 =12=== 1/36 for p11
	  / Total 11 different possible outcomes
Above is the probability distribution function (aka pdf) of a double-roll before considering any penalty.
The penalty simply changes the score from [2,12] to some other range. Start your answers by calculating the probability distribution for the penalty scenario of your question. If your penalty is going to combine several outcomes, figure that out at this point.

For example, a penalty of -3:
		    = 2-3 = 0 === 1/36
		    = 3-3 = 0 === 2/36  Total for outcome "0" = 1/36+2/36 = 3/36
	  So,          n0 = 0 === 3/36 for p0
		n1  = 4-3 = 1 === 3/36 for p1
		n2  = 5-3 = 2 === 4/36 for p2
		n3  = 6-3 = 3 === 5/36 for p3
		n4  = 7-3 = 4 === 6/36 for p4
		n5  = 8-3 = 5 === 5/36 for p5
		n6  = 9-3 = 6 === 4/36 for p6 
		n7  = 10-3= 7 === 3/36 for p7
		n8  = 11-3= 8 === 2/36 for p8
		n9  = 12-3= 9 === 1/36 for p9
		/ Total 10 different possible outcomes
Note that I've changed the numbering so that n4 and p4 are for the case where the score is "4", etc. It's easier to remember that way.

Now, let's pick the n's and p's to plug into the multinomial probability formula.
The value to use for N (just N by itself, not n1,n2,etc) in the formula is easy; that's the number of trials.

What about the n1, n2,...nk for the formula? In order to figure that out, you have to determine which outcome cases you're interested in. There is no easy way to automate it; you have to figure these out based on what question you are asking. Tip: the n1...nk you use should add up to the value you use for N, or else you're doing it wrong.

Example 1:
"What is the chance I will score 7 points six times" --- means you want outcome "7" to happen six times.
So, the cases you are interested in are case 7, and case "anything other than 7".
That first case means n7. We want n7 to happen six times, so we set n7=6 in the formula. Now we need p7.
From your probability distribution, you see that p7 (for outcome "7") has a probability of 6/36.
"Everything else" should happen the rest of the rolls, so n_other = 30.
For the probability of "anything else", p_other, you can either add up n1,n2,n3... leaving out n7, or just recognize that because they all sum to 1.00, the probability of "every other outcome" is 1-p7, or 30/36.

Plugging these into the formula, we get:
P = [N!/(n7!*n_other!)] * [(p7^n7)*(p_other^n_other)]
P = [36!/(6!*30!)] * [(6/36^6)*(30/36^30)]
which I trust will work out to ~0.176, as Nomyte said.
Example 2:
"The chance of a cumulative total of exactly 2 points across the sum of all 40 trials"

You've already figured out that on each roll (trial), you'll either get 0 or 1 points, so there is only outcome n0 and n1 to consider. You can figure out p0 and p1 fairly simply, too.
N is obviously 40.
Next, list all possible ways of getting exactly 2 points from a sum of 40 trials.
That's pretty easy, yes? It only happens if we get the single point outcome twice.
So, this gives the n0, n1 for the desired number of instances: n0=38, n1=2.
Putting this into the formula, we get:
P = [40!/(38!+2!)] * [ (p0^38)*(p1^2)]
Which again, simplifies to the same formula and should work out to the same answer as when using Nomyte's method above.

Example 3:
Five rolls, so N = 5
Each roll gives a value from 7 to 17. Go ahead and prepare the pdf table giving n7...n17 and p7...p17.
You want to know the probability of getting 35 through 85? That's actually 51 different questions, to be answered independently, one at a time.
I'll walk you through a couple of them.

  • For the probability of getting 35, figure out what combinations of results for the five rolls will sum to 35:
  • 7,7,7,7,7 is the only possibility.
    That means five results of 7, and 0 results of anything else.
    So you only have to use n7=5, and p7 in the multinomial distribution formula.
  • For the probability of getting 40, figure out what combination of outcomes sum to 40:
  • It turns out there are seven possible ways to get sums of 40:
    12,7,7,7,7 === one n12, four n7's
    11,8,7,7,7 === one n11, one n8, three n7's
    10,9,7,7,7 etc
    10,8,8,7,7
    9,9,8,7,7
    9,8,8,8,7
    8,8,8,8,8
    For each of these seven ways above, figure out the N's and corresponding P's, plug them into the multinomial formula to get the probability of getting 40 that way, and then sum those seven results to get the total probability of getting a sum of 40 from any of the possible ways.
  • etc, etc for the other sums. You can probably find a creative use of excel to help you enumerate all possible ways of forming the sums.


  • So, that is the method; no further probability theory necessary. The calculation is full of tedious mechanical repetition, making it a prime candidate for automating the steps in excel or a programming tool of your choice.
    posted by ceribus peribus at 3:11 AM on September 29, 2012 [3 favorites]


    Example 3 is just rolling 10 dice and adding 25, so you really just want the chances for various sums on 10 dice. There's formulas for that. Or you can kind of dirtily/wonderfully use Excel like this. It's actually useful to understand how the Excel method works, and how you might change it for 4-sided dice, or dice with sides 0, 0, 0, 1, 2, 3 for example.

    Your examples skirt around what you'd have to do if your penalty was some interesting value like 4 so you roll 2 dice and subtract 4 each round. That would make you have to consider each round as a variable that can have different values with a non-flat distribution. With penalty 4 specifically your possible results would be (0, 1, 2, 3, 4, 5, 6, 7, 8) and the probabilities would be (6/36, 4/36, 5/36, 6/36, 5/36, ... , 1/36).

    In that case, calculating it can be pretty effectively done by a program using the recursive formula based on what the Excel trick above is doing. What is your chance of getting X over N rounds? Think of the final round. You've done N-1 rounds so far and have one more roll. For a given total, express the chances of getting that as what you got on the previous N-1 rounds along with the chance of what you got this round. For example to get to 22 on 10 dice, you could roll (1, 2, 3, 4, 5, 6) on the last roll and over the N-1 previous rolls you had (21, 20, 19, 18, 17, 16). you had 18 on N-1 trials and then you rolled a 4. Or maybe it was 17 and 5. Etc.

    Or just run a few million simulations to get an approximation and call it close enough.

    If you want (cougharesupposedtocough) use multinomials, well you could brute force out all possibilities of the N rounds (on a computer). Then for each possibility you calculate the sum and the probability of rolling that sum using the multinomial distribution formula, and add it to whichever pile as you go along. This is an inefficient way to do things but it's straightforward.
    posted by fleacircus at 4:01 AM on September 29, 2012


    Fwiw, you say that you aren't interested in simulation, but
    1) You should use simulation to check your answer as you learn probability arguments. Given the comparative complexity, you are much more likely to make a mistake in implementation of the exact calculation.
    2) as an applied problem, the calculation that you have to do to use the above for penalty greater than 2 will be microscopically faster (if at all) and perhaps less accurate (numerical issues when dealing with small and large numbers) than just simulating to get accurate probability estimates for all but rare events.

    For example, in R (which is a slowish but easy to write way to do this)
    set.seed(1)
    number_rolls = seq(from=36, to=43)
    penalty = 10
    number_simulations = 100000
    simulation_output <-list()
    for( i in number_rolls) {
    temp<-pmax(ceiling(runif(number_simulations*i)*6) + ceiling(runif(number_simulations*i)*6) - penalty, 0)
    simulation_output[[i]]<-table( colSums( matrix(temp, nrow=i) ) ) / number_simulations
    }
    cbind(simulation_output[[36]])

    0 0.04304
    1 0.09660
    2 0.14861
    3 0.16905
    4 0.16361
    5 0.13632
    6 0.09689
    7 0.06506
    8 0.03934
    9 0.02134
    10 0.01106
    11 0.00507
    12 0.00240
    13 0.00090
    14 0.00044
    15 0.00018
    16 0.00005
    17 0.00004

    cbind(simulation_output[[38]])
    [,1]
    0 0.03559
    1 0.08540
    2 0.13710
    3 0.16313
    4 0.16485
    5 0.13944
    6 0.10672
    7 0.07103
    8 0.04465
    9 0.02583
    10 0.01357
    11 0.00691
    12 0.00333
    13 0.00148
    14 0.00060
    15 0.00024
    16 0.00011
    17 0.00001
    18 0.00001
    posted by a robot made out of meat at 10:19 AM on September 30, 2012


    « Older So because I have this I've got to be how old?   |   Should I maintain this burgeoning friendship? If... Newer »
    This thread is closed to new comments.