# Need help with probability and statistics

June 5, 2005 9:04 AM Subscribe

In my exploration of poker odds and strategies, I've developed more programs for examining situations. I need help evaluating some of the probability and statistics involved.

When I first started making programs to evaluate poker, I took a brute force approach, generating all possible combinations and tallying the results. With a few exceptions, this is a far too time consuming method of calculation, since there are often around 10^12 or more combinations to examine, and right now I'm peaking at about 2000-4000 simulations per second. I've increased this a large deal with distributed computing but that still generally only brings me to about 8000/s with my available resources.

Anyway, there are two kinds of situations I'm generally interested in. One is determining the odds of hitting a given hand, and the other is the odds of winning against the players still in the table. Both of these take into account the "known cards". In some games, like 7 card stud, this knowledge can be considerable (it's not uncommon to know half the deck or more).

So, to do this, over and over again I deal out the remaining cards and see how it comes out. My question revolves around, how many times do I need to do this to get accurate numbers?

On the one hand, there are soooo many combinations. millions of billions usually. On the other hand, there is a relatively small number of outcomes. There are really only 10 possible hands, for example. The ranking among each hand only matters when 2 or more players get the same hand. So, among these millions of billions of situations I'm guessing there are less than a few hundred thousand actual discernible outcomes.

Also, I'm relatively uninterested in exact numbers. To the nearest 1% would probably be fine. If you're playing with pot odds for example, you want to know if your probability of winning is greater than the percent of the pot you'd have to put in (that is, is the expected payoff better than the cost). In general if my winning odds are not considerably better than the pot odds, I'm not going to use pot odds calculations to determine whether or not to bet, I'll use some other estimator (opponents play history, my holding size, his holding size, etc, how he's bet so far in the hand, etc)

OK. So that turned out to be kind of a rambling question. But I guess to boil it down: how many simulations should I expect to have to do? Given a certain situation, and a given number of simulations, how can I express the certainty of the percentages, as in, you have a 28.4% chance of winning, plus or minus 2%?

When I first started making programs to evaluate poker, I took a brute force approach, generating all possible combinations and tallying the results. With a few exceptions, this is a far too time consuming method of calculation, since there are often around 10^12 or more combinations to examine, and right now I'm peaking at about 2000-4000 simulations per second. I've increased this a large deal with distributed computing but that still generally only brings me to about 8000/s with my available resources.

Anyway, there are two kinds of situations I'm generally interested in. One is determining the odds of hitting a given hand, and the other is the odds of winning against the players still in the table. Both of these take into account the "known cards". In some games, like 7 card stud, this knowledge can be considerable (it's not uncommon to know half the deck or more).

So, to do this, over and over again I deal out the remaining cards and see how it comes out. My question revolves around, how many times do I need to do this to get accurate numbers?

On the one hand, there are soooo many combinations. millions of billions usually. On the other hand, there is a relatively small number of outcomes. There are really only 10 possible hands, for example. The ranking among each hand only matters when 2 or more players get the same hand. So, among these millions of billions of situations I'm guessing there are less than a few hundred thousand actual discernible outcomes.

Also, I'm relatively uninterested in exact numbers. To the nearest 1% would probably be fine. If you're playing with pot odds for example, you want to know if your probability of winning is greater than the percent of the pot you'd have to put in (that is, is the expected payoff better than the cost). In general if my winning odds are not considerably better than the pot odds, I'm not going to use pot odds calculations to determine whether or not to bet, I'll use some other estimator (opponents play history, my holding size, his holding size, etc, how he's bet so far in the hand, etc)

OK. So that turned out to be kind of a rambling question. But I guess to boil it down: how many simulations should I expect to have to do? Given a certain situation, and a given number of simulations, how can I express the certainty of the percentages, as in, you have a 28.4% chance of winning, plus or minus 2%?

Response by poster: Looks interesting, I'll have to do some reading and try that out.

posted by RustyBrooks at 11:24 AM on June 5, 2005

posted by RustyBrooks at 11:24 AM on June 5, 2005

To take your last question first, your standard error is inversely proportional to the square root of the number of trials. So more trials means less error, which is obvious, but to cut the error in half, you need four times as many trials.

Lets say you run 1000 trials and win on 90 of them. Your standard error is 90/sqrt(1000) = 2.8. To put things in percentages, this 9% +/- 0.28%. It has been awhile but I'm pretty sure this is the direction to go in.

I'm not quite sure what you mean here. If you want to find out the odds of a given hand you should be looking at your probability and combinatorics. Simulation is probably (ha!) not the way to go.

For the second situation I assume you start with given set of cards in your hands and then deal out the rest of the hand assuming everyone stays in. Determing the odds of winning is just a matter of playing out the game. If your question is how many hands do I need to play before I get a feel for what my odds are with a given set of cards. The answer is not many and you don't need to go through every possibility.

posted by euphorb at 12:20 PM on June 5, 2005

Lets say you run 1000 trials and win on 90 of them. Your standard error is 90/sqrt(1000) = 2.8. To put things in percentages, this 9% +/- 0.28%. It has been awhile but I'm pretty sure this is the direction to go in.

Anyway, there are two kinds of situation I'm generally interested in. One is determining the odds of hitting a given hand, and the other is the odds of winning against the players still in the table.

Anyway, there are two kinds of situation I'm generally interested in. One is determining the odds of hitting a given hand, and the other is the odds of winning against the players still in the table.

I'm not quite sure what you mean here. If you want to find out the odds of a given hand you should be looking at your probability and combinatorics. Simulation is probably (ha!) not the way to go.

For the second situation I assume you start with given set of cards in your hands and then deal out the rest of the hand assuming everyone stays in. Determing the odds of winning is just a matter of playing out the game. If your question is how many hands do I need to play before I get a feel for what my odds are with a given set of cards. The answer is not many and you don't need to go through every possibility.

posted by euphorb at 12:20 PM on June 5, 2005

Response by poster: To clarify, I want to know the odds of hitting a given hand, given my current situation and all the other known cards. So say I'm playing 7 card stud. I've been dealt 4 of the 7 cards I'll eventually get. This means I know all 4 of my cards, and most likely 2 cards from each other player (the other players will have 2 down). If there are 8 players and everyone called the first round of betting, then I'll know 20 out of 52 cards. Treating this combinatorially is somewhat difficult. I have 3 cards to go, and I might need 1, 2 or 3 to make a hand. In addition, some of the cards I might need to make a straight or a flush will not be available.

When playing poker, to estimate the odds, typically you count the number of "outs" you have and divide by (52-number of cards you know about). So, in the above case, if I had 4 of a straight and also 4 of a flush, I might have 5 or 6 cards that could complete my straight, and perhaps 9 cards that could complete my flush (assuming none of those cards is face up in someone elses' hand). With 32 cards to go I'd have a 15/32 chance of making a straight or a flush. This is a really good way of doing it on the fly, and people can get pretty good at it. But... this process is sort of difficult to model.

Pretty much the only poker game I know of so far where you can combinatorially solve the probabilities for all possible hands is texas holdem. That's because the only place where you give a crap about those probabilities is after you have your 2 hole cards and the flop has been dealt. This means there are only 2 cards to go, and 48 cards to choose from, and 48 choose 2 is not too high to do the right calculations. But also it's extremely easy to do them for yourself, so... I really want a program like this for more computationally expensive games like 7 card stud (esp. hi/lo) and omaha (well, omaha isn't too bad, concerning generating probabilities of hitting hands).

Anyway, the point is, it's not like I'm looking at a way to generate *static* probabilities for given hands, but rather probabilities for this *given* hand.

posted by RustyBrooks at 5:43 PM on June 5, 2005

When playing poker, to estimate the odds, typically you count the number of "outs" you have and divide by (52-number of cards you know about). So, in the above case, if I had 4 of a straight and also 4 of a flush, I might have 5 or 6 cards that could complete my straight, and perhaps 9 cards that could complete my flush (assuming none of those cards is face up in someone elses' hand). With 32 cards to go I'd have a 15/32 chance of making a straight or a flush. This is a really good way of doing it on the fly, and people can get pretty good at it. But... this process is sort of difficult to model.

Pretty much the only poker game I know of so far where you can combinatorially solve the probabilities for all possible hands is texas holdem. That's because the only place where you give a crap about those probabilities is after you have your 2 hole cards and the flop has been dealt. This means there are only 2 cards to go, and 48 cards to choose from, and 48 choose 2 is not too high to do the right calculations. But also it's extremely easy to do them for yourself, so... I really want a program like this for more computationally expensive games like 7 card stud (esp. hi/lo) and omaha (well, omaha isn't too bad, concerning generating probabilities of hitting hands).

Anyway, the point is, it's not like I'm looking at a way to generate *static* probabilities for given hands, but rather probabilities for this *given* hand.

posted by RustyBrooks at 5:43 PM on June 5, 2005

Response by poster: I ran some numbers and I actually probably could go back to brute-forcing the hand-hit statistics. I've improved the program a lot since back when I was doing that. In most cases for 7 card stud I would have 10,000 possibilities or fewer to look through. Still -- if I can get a reasonably good answer by doing 2000 simulations that could be much much faster than going through 10,000 combinatoric hands.

posted by RustyBrooks at 5:47 PM on June 5, 2005

posted by RustyBrooks at 5:47 PM on June 5, 2005

there's a whole pile of different questions here and i didn't read very carefully when i first answered.

like euphorb, i suspect an analytic approach is best. that doesn't mean that you have to do it in your head - but if you could write a program to do the calculations it woul dbe much more efficient than running simulations.

but getting back to your idea that you get to some point, know the remaining cards, and then deal out hands at random. i would guess (and it's only a guess) that the sqrt(n) estimate would be more-or-less right for the number of times someone wins. so let's say that you have 3 players (A, B C) and after 1000 simulations, they have won as follows:

A 4, B 9, C 16

my intuition, and i think euphorb would agree with me, is that you can take the square root of those numbers to give the errors. years ago i might even have been able to explain why. anyway, if you do that, you get:

A 4+/-2, B 9+/-3, C 16+/-4

so C-B (asusming independent errors, which seems a bit unlikely) has an error of sqrt(3*3+4*4) (usual rule about combining variances), or 4.

so C-B = 16-9 +/- 5 = 7+/-5

so the probability that C beats B is the integral of a gaussian from -7/5 to +infinity, which i would guess is around 85%.

that's all very sketchy, but should give you rough probabilities. did it make any sense at all?

(and you can use that, plus some confidence limit, to see howgood your results are, and so decide when to stop doing more random hands).

posted by andrew cooke at 6:31 PM on June 5, 2005

like euphorb, i suspect an analytic approach is best. that doesn't mean that you have to do it in your head - but if you could write a program to do the calculations it woul dbe much more efficient than running simulations.

but getting back to your idea that you get to some point, know the remaining cards, and then deal out hands at random. i would guess (and it's only a guess) that the sqrt(n) estimate would be more-or-less right for the number of times someone wins. so let's say that you have 3 players (A, B C) and after 1000 simulations, they have won as follows:

A 4, B 9, C 16

my intuition, and i think euphorb would agree with me, is that you can take the square root of those numbers to give the errors. years ago i might even have been able to explain why. anyway, if you do that, you get:

A 4+/-2, B 9+/-3, C 16+/-4

so C-B (asusming independent errors, which seems a bit unlikely) has an error of sqrt(3*3+4*4) (usual rule about combining variances), or 4.

so C-B = 16-9 +/- 5 = 7+/-5

so the probability that C beats B is the integral of a gaussian from -7/5 to +infinity, which i would guess is around 85%.

that's all very sketchy, but should give you rough probabilities. did it make any sense at all?

(and you can use that, plus some confidence limit, to see howgood your results are, and so decide when to stop doing more random hands).

posted by andrew cooke at 6:31 PM on June 5, 2005

"variances), or 4" should be "5", as used later.

posted by andrew cooke at 6:32 PM on June 5, 2005

posted by andrew cooke at 6:32 PM on June 5, 2005

oh, that's stupid because 4+9+16 doesn't add up to 1000. and someone has to win. so they are clearly not independent. but even so, i think this gives you a rough guide (but change "1000" to "29" in the argument above!).

posted by andrew cooke at 6:34 PM on June 5, 2005

posted by andrew cooke at 6:34 PM on June 5, 2005

and the reason why, which i now vaguely remember, is connected to the assumption that these (A, B, or C winning) are independent and "random", which gives poisson stats, which is related to gaussian for large numbers in a way that makes the sqrt(n) work.

so this isn't really correct, since those assumptions don't hold. but even so, it's probably a good rule of thumb.

posted by andrew cooke at 6:37 PM on June 5, 2005

so this isn't really correct, since those assumptions don't hold. but even so, it's probably a good rule of thumb.

posted by andrew cooke at 6:37 PM on June 5, 2005

Response by poster: When I talk about simulations with respect to hand probabilities, all I'm really saying is that out of the n choose k cards, I'm taking a random subset of these and using those instead. I'm kind of drawing a blank as to an elegant analytical approach to determining the odds of drawing given hands. If you only need one card to complete a hand, that's actually fairly straightforward. You'd only have to examine about 50 cards or so. Each card would "lead" to a possible hand (like, 5h gives me a straight flush, 5c, 5d, 5s gives me a straight, 7c, 7d, 7s gives me a pair of 7s', etc), and each card has a given probability of coming up. Something similar occurs if you need 2 cards, although you could have up to 2500 pairs to look through. If you need 3 cards to complete a hand you really ought not be betting. It's worth looking into.

I think I see what you're saying with respect to the square root thing. It also confirms what I suspect which is that the larger the probability of an event is, the less simulation you need to do to be sure you're relatively close. And that could be very helpful because I'm pretty much uninterested in anything that happens less than about 10% of the time, except in very extreme circumstances. (a bet needs to pay of 10X what I put in to make it worth taking a 9 to 1 longshot). However... I am interested in relatively unlikely things that *add up to* more than 10%. The probability of getting 2 pair is nice to know but even better is the probability of getting 2 pair *or better* which can include several instances of pretty unlikely events adding up to something more likely.

I do apologize for the loosey-gooseyness of the problem. Part of the issue is that I just don't know how to properly formulate what it is I'm after. If I did I'd probably have a better idea of how to go about it.

So far my programs are not helping. I feel that they're accurate (in fact I spent a good deal of time today watching old WSOP events and comparing my stats to theirs) but the programs require too much babysitting, they don't do "enough" for me. I spend too much time trying to deduce from the data in my program and not enough time keeping track of the betting and folding. Still it's saved my bacon a few times.

posted by RustyBrooks at 8:01 PM on June 5, 2005

I think I see what you're saying with respect to the square root thing. It also confirms what I suspect which is that the larger the probability of an event is, the less simulation you need to do to be sure you're relatively close. And that could be very helpful because I'm pretty much uninterested in anything that happens less than about 10% of the time, except in very extreme circumstances. (a bet needs to pay of 10X what I put in to make it worth taking a 9 to 1 longshot). However... I am interested in relatively unlikely things that *add up to* more than 10%. The probability of getting 2 pair is nice to know but even better is the probability of getting 2 pair *or better* which can include several instances of pretty unlikely events adding up to something more likely.

I do apologize for the loosey-gooseyness of the problem. Part of the issue is that I just don't know how to properly formulate what it is I'm after. If I did I'd probably have a better idea of how to go about it.

So far my programs are not helping. I feel that they're accurate (in fact I spent a good deal of time today watching old WSOP events and comparing my stats to theirs) but the programs require too much babysitting, they don't do "enough" for me. I spend too much time trying to deduce from the data in my program and not enough time keeping track of the betting and folding. Still it's saved my bacon a few times.

posted by RustyBrooks at 8:01 PM on June 5, 2005

This thread is closed to new comments.

posted by andrew cooke at 9:10 AM on June 5, 2005