December 25, 2013 4:43 AM Subscribe

I have a question about probability math. I am essentially flipping a coin (except instead of a 50/50 chance, my odds are 50.5% heads, 49.5% tails).
I am concerned with the probability of me hitting heads (a 50.5% chance) several times in a row.

I understand the basic formula for determining the probability would be 0.505^x, where x is the number of coin flips. How do I translate this percentage chance into a number of flips I could make before I could expect to hit that result? Or the amount of time (at, say, one flip per second) that might pass before I reasonably hit a certain streak?

Say I'm wondering how long it will take me to hit a streak of 14 heads in a row, assuming one flip per second. 14 heads in a row has a 0.007% chance of occurring... so can I translate that into an amount of time (or number of flips) that might elapse before that happens?

I guess I'm confused by the idea that the probability essentially resets every time the streak is broken.

I'm pretty tired and this is pretty weird and I'm bad at math, so please feel free to call out the obvious thing I'm missing here.

Thanks!
posted by disillusioned to Science & Nature (42 answers total) 3 users marked this as a favorite

I understand the basic formula for determining the probability would be 0.505^x, where x is the number of coin flips. How do I translate this percentage chance into a number of flips I could make before I could expect to hit that result? Or the amount of time (at, say, one flip per second) that might pass before I reasonably hit a certain streak?

Say I'm wondering how long it will take me to hit a streak of 14 heads in a row, assuming one flip per second. 14 heads in a row has a 0.007% chance of occurring... so can I translate that into an amount of time (or number of flips) that might elapse before that happens?

I guess I'm confused by the idea that the probability essentially resets every time the streak is broken.

I'm pretty tired and this is pretty weird and I'm bad at math, so please feel free to call out the obvious thing I'm missing here.

Thanks!

There are two things going on here: Bernoulli process (probability of heads with a biased coin) and a geometric distribution (number of trials until first success). I think the second one is the root of your question, since your target event is pretty simple.

posted by supercres at 5:14 AM on December 25, 2013

posted by supercres at 5:14 AM on December 25, 2013

Wait, it's complicated because each 17 flips isn't an independent trial; it's not 100,000 people flipping a biased coin 17 times and expecting 7 to have them all heads. You stop once the streak is broken. This post is the same question (couched in stat terms), and seems plausible.

posted by supercres at 5:23 AM on December 25, 2013

posted by supercres at 5:23 AM on December 25, 2013

No, each flip is the very definition of an independent event.

posted by dfriedman at 6:42 AM on December 25, 2013 [9 favorites]

On each flip, you have a 0.505^n chance of the next n flips giving the result "heads". Define this as a "success". You want the geometric distribution with probability of success 0.505^n.

posted by downing street memo at 7:01 AM on December 25, 2013

posted by downing street memo at 7:01 AM on December 25, 2013

There is actually no guarantee that you will ever get 14 heads in a row, but the probability of getting the streak obviously approaches 100% the longer you flip. This is illustrated in the second graph here, which is a discussion of a similar problem; that graph illustrates the probability of 10 in a row when heads have 55% odds.

So the question is whether you are trying to find the average number of flips required to get the streak (which was 881 in that illustration but sometimes took over 5000 flips), or something else, like the number of flips that give a 95% certainty of achieving the streak.

This discussion, with your odds substituted, will probably answer your question either way.

posted by beagle at 7:32 AM on December 25, 2013

So the question is whether you are trying to find the average number of flips required to get the streak (which was 881 in that illustration but sometimes took over 5000 flips), or something else, like the number of flips that give a 95% certainty of achieving the streak.

This discussion, with your odds substituted, will probably answer your question either way.

posted by beagle at 7:32 AM on December 25, 2013

Depends on how you define "guarantee", yeah? The strong LLN implies almost sure convergence to 1, which is tantamount to a "guarantee" in the colloquial sense of the term, although I agree that it wouldn't break the laws of mathematics to have an infinite series of coin flips with no set of 14 heads in a row among them.

posted by downing street memo at 7:37 AM on December 25, 2013

"almost sure" equates to "no guarantee". IOW there is no calculable number of flips that will absolutely deliver any particular desired sequence.

posted by beagle at 7:54 AM on December 25, 2013

posted by beagle at 7:54 AM on December 25, 2013

What you want is a streak calculator, actually. Using binomial probability will tell you the chance that starting now, the next N flips will be heads. But if what you're actually going to do is keep flipping forever until you hit a sequence of N heads, the streak calculator is what you want.

There IS a closed form of this formula I think, but it's fairly complicated. Most calculators online use markov chains I think. Try this one out:

http://www.sbrforum.com/betting-tools/streak-calculator/

I guess it also depends on what you mean by "expect" to hit it. I'll interpret that for now as having a 50/50 chance of hitting it within X throws. According to that calculator you'd need to do about 26,000 flips to have a 50/50 chance of getting 14 in a row. It's pretty rare.

posted by RustyBrooks at 8:04 AM on December 25, 2013

There IS a closed form of this formula I think, but it's fairly complicated. Most calculators online use markov chains I think. Try this one out:

http://www.sbrforum.com/betting-tools/streak-calculator/

I guess it also depends on what you mean by "expect" to hit it. I'll interpret that for now as having a 50/50 chance of hitting it within X throws. According to that calculator you'd need to do about 26,000 flips to have a 50/50 chance of getting 14 in a row. It's pretty rare.

posted by RustyBrooks at 8:04 AM on December 25, 2013

200,000 flips would get you to about 99.5% chance of hitting it.

posted by RustyBrooks at 8:05 AM on December 25, 2013

posted by RustyBrooks at 8:05 AM on December 25, 2013

Set up a Markov chain having fifteen states, one for each of the different numbers of heads you've gotten in a row, and the appropriate transition matrix. You then calculate the "expected hitting time" for each state, which tells you on average how long it takes you to reach your goal. You can do this calculation by dynamic programming.

posted by esprit de l'escalier at 8:40 AM on December 25, 2013 [3 favorites]

posted by esprit de l'escalier at 8:40 AM on December 25, 2013 [3 favorites]

I understand the basic formula for determining the probability would be 0.505^x, where x is the number of coin flips. How do I translate this percentage chance into a number of flips I could make before I could expect to hit that result? Or the amount of time (at, say, one flip per second) that might pass before I reasonably hit a certain streak?

Say I'm wondering how long it will take me to hit a streak of 14 heads in a row, assuming one flip per second. 14 heads in a row has a 0.007% chance of occurring... so can I translate that into an amount of time (or number of flips) that might elapse before that happens?

I guess I'm confused by the idea that the probability essentially resets every time the streak is broken.

The probability doesn't "reset". It is a fixed value. Any given coin flip has a 50.5% chance of coming up heads. A streak of N heads is simply one particular permutation of N coin flips, and there just so happens to be a 0.505^N chance that this particular permutation will occur given a random generation of a N coin flips. Let's call your permutation S for convenience's sake.

S = H H H H H... (N times)

Your original question, how to come up with a figure for the amount of time we should "reasonably" expect it would take to generate S, is a bit fuzzy. It's possible given a series of M coin flips that S will never occur, even for very large M. But I get what you're saying. You want to know more or less how many coin flips you'll have to do before there is a, like, 99% chance you will have seen S by now.

We might be able to come up with a function P(x) which gives the odds that S will occur given x coin flips. Then we could try to calculate x for P(x) > 0.99.

posted by deathpanels at 8:43 AM on December 25, 2013 [1 favorite]

I meant more that the OP's request to know "reasonably" when something would happen is not precise enough.

posted by deathpanels at 8:53 AM on December 25, 2013 [1 favorite]

posted by deathpanels at 8:53 AM on December 25, 2013 [1 favorite]

If the probability of any given flip coming up heads is 0.505, then in any sequence of more than x coin flips the probability that any given flip is the last of a length-x streak is 0.505^{x}. Which means that the probability that any given flip is *not* the last of a length-x streak is (1 - 0.505^{x}).

So now we need a formula for the probability of being able to perform a sequence of y coin flips (where y ≥ x) that*does not end* with a length-x streak. The xth flip has probability (1 - 0.505^{x}) of not ending such a streak; so does the x+1st flip, the x+2nd flip, the x+3rd flip and so on. The probability p that *all* the flips from the xth to the yth don't end a length-x streak is the product of the probabilities for *each* of those flips:

p = (1 - 0.505^{x})^{y-x+1}

To get an answer to your original question, set p to some arbitrary but small value and solve for y:

(1 - 0.505^{x})^{y-x+1} = p

y - x + 1 = log_{(1 - 0.505x)} p

y - x + 1 = log p / log (1 - 0.505^{x})

y = log p / log (1 - 0.505^{x}) + x - 1

For x = 14 and p = 0.5% = 0.005, y = 75530.

posted by flabdablet at 9:14 AM on December 25, 2013 [1 favorite]

So now we need a formula for the probability of being able to perform a sequence of y coin flips (where y ≥ x) that

p = (1 - 0.505

To get an answer to your original question, set p to some arbitrary but small value and solve for y:

(1 - 0.505

y - x + 1 = log

y - x + 1 = log p / log (1 - 0.505

y = log p / log (1 - 0.505

For x = 14 and p = 0.5% = 0.005, y = 75530.

posted by flabdablet at 9:14 AM on December 25, 2013 [1 favorite]

I don't think your approach works flabdablet - are you claiming that after 75530 flips, you've definitely had 14 in a row? That's obviously not true.

posted by RustyBrooks at 9:27 AM on December 25, 2013

posted by RustyBrooks at 9:27 AM on December 25, 2013

No, I'm claiming that flipping the coin 75530 times yields an 0.5% chance of *not* seeing a streak of 14 heads. Which means that there's a 99.5% chance that you *will* see a streak of 14 heads without needing to do that many flips.

You can set p to make your criterion for "reasonably certain" as tight as you like.

posted by flabdablet at 9:32 AM on December 25, 2013 [3 favorites]

You can set p to make your criterion for "reasonably certain" as tight as you like.

posted by flabdablet at 9:32 AM on December 25, 2013 [3 favorites]

Do you really need Markov chains for something like this? Seems like a relatively simple Bernoulli trial problem regardless of the approach.

posted by downing street memo at 9:34 AM on December 25, 2013

posted by downing street memo at 9:34 AM on December 25, 2013

If "expect to hit that result" means no more than "am at least as likely to hit that result as not to hit it", then set p = 0.5 which makes the required number of flips come out to 9880.

posted by flabdablet at 9:39 AM on December 25, 2013 [1 favorite]

posted by flabdablet at 9:39 AM on December 25, 2013 [1 favorite]

This is an example of something called a first passage problem, where you have some sort of random process and you are interested in how long it takes to achieve some state. The most obvious such kind is a gambler who has, say, $1000 dollars and bets a dollar on every coin flip. You might be interested in the average time until the gambler has no money. These type of problems are often easy to formulate but difficult to get a closed solution for.

In your case, we'd first want to think about what the final state (getting 14 heads in a row) would look like. Suppose there are N coins. For the first (N-15) coins, you'd have any mixture of heads and tails with no group of 14 heads in a row. Then you have one tails (T) followed by 14 heads (H). The one case where that isn't true is if you got 14 H in a row initially and thus there wouldn't need to be that initial T, but any other one will look like that. (I suspect that the thing you could tell you were missing was that your sequence must start with a T, not just be 14 H)

How do we write this in terms of probabilities? Well, let's define P(n) as the probability that you have finally achieved your goal at the nth coin flip. If n is big, then you have:

P(n) = (the probability of not having achieved your goal at flip P(n-15) ) * (the probability of getting a T) * (the probability of getting 14 H in a row)

in your case, that is:

P(n) = (1-P(n-15)) * (0.495) * 0.505^14

So now you have a recursive function that lets you numerically calculate the probability at any number of flips in terms of probabilities of lower numbers of flips. Of course, if we do it this way, we're slightly messing up for all those cases where (n-15) is zero or negative and thus the probability of achieving the goal already was zero. For those situations we have stopping conditions that all look like:

P(n) = (any combination of n-15 flips) * (one T) * (14 H in a row)

Well, the the probability of any combination of n-15 flips is 1 since we aren't limiting ourselves at all, so for n<30 we have:

P(n) = (0.495) * 0.505^14

And obviously P(n) = 0 for n<1>

So now we have a function, P(n), that gives you the probability of stopping at flip n. You can then get any calculation you'd want off this function, although it's not obvious to me staring at a computer if this reduces to something simpler or if you need to do it numerically. But if, for instance, you wanted to find the mean number of flips, that's just the sum from n = 1 to infinity of n*P(n).

posted by Schismatic at 10:11 AM on December 25, 2013 [2 favorites]

In your case, we'd first want to think about what the final state (getting 14 heads in a row) would look like. Suppose there are N coins. For the first (N-15) coins, you'd have any mixture of heads and tails with no group of 14 heads in a row. Then you have one tails (T) followed by 14 heads (H). The one case where that isn't true is if you got 14 H in a row initially and thus there wouldn't need to be that initial T, but any other one will look like that. (I suspect that the thing you could tell you were missing was that your sequence must start with a T, not just be 14 H)

How do we write this in terms of probabilities? Well, let's define P(n) as the probability that you have finally achieved your goal at the nth coin flip. If n is big, then you have:

P(n) = (the probability of not having achieved your goal at flip P(n-15) ) * (the probability of getting a T) * (the probability of getting 14 H in a row)

in your case, that is:

P(n) = (1-P(n-15)) * (0.495) * 0.505^14

So now you have a recursive function that lets you numerically calculate the probability at any number of flips in terms of probabilities of lower numbers of flips. Of course, if we do it this way, we're slightly messing up for all those cases where (n-15) is zero or negative and thus the probability of achieving the goal already was zero. For those situations we have stopping conditions that all look like:

P(n) = (any combination of n-15 flips) * (one T) * (14 H in a row)

Well, the the probability of any combination of n-15 flips is 1 since we aren't limiting ourselves at all, so for n<30 we have:

P(n) = (0.495) * 0.505^14

And obviously P(n) = 0 for n<1>

So now we have a function, P(n), that gives you the probability of stopping at flip n. You can then get any calculation you'd want off this function, although it's not obvious to me staring at a computer if this reduces to something simpler or if you need to do it numerically. But if, for instance, you wanted to find the mean number of flips, that's just the sum from n = 1 to infinity of n*P(n).

posted by Schismatic at 10:11 AM on December 25, 2013 [2 favorites]

I wrote a little coin flip simulator in C, and it's giving me numbers of flips for p = 0.5 that are about twice what my formula predicts. But if I tell it to look for a streak of 13 instead of 14, it does come out to about what my formula says should be the result for 14.

*Then you have one tails (T) followed by 14 heads (H)*

Ahhh! so this is wrong:

*the probability that any given flip is the last of a length-x streak is 0.505*^{x}

because a length-x streak is actually a length x+1 pattern consisting of a probability-0.495 event followed by x probability-0.505 events, for an overall probability of 0.495 * 0.505^{x}. The only time this isn't true is right at the start of the exercise.

So let's do the derivation again.

The probability that any given flip will be the last of a length-x streak is 0 for flips 1..x-1, or 0.505^{x} for flip x, or 0.495 * 0.505^{x} for flips x+1..y where y is the total number of flips performed and y > x.

That means that the probability of any given flip*not* being the last of a length-x streak is 1 for flips 1..x-1, or (1 - 0.505^{x}) for flip x, or (1 - 0.495 * 0.505^{x}) for flips x+1..y.

The probability that*none* of flips x+1..y will be the last of a length-x streak is the probability that *all* of them will fail to be so, which is (1 - 0.495 * 0.505^{x})^{y-x}.

Call this p, and solve for y given p:

a^{y-x} = p where a = (1 - 0.495 * 0.505^{x})

y - x = log_{a} p

y - x = log p / log a

y = x + log p / log (1 - 0.495 * 0.505^{x})

If I now put x = 14 and p = 0.5, y comes out to 19973; for p = 0.005, y is 152576. These are consistent with the results I'm seeing from the simulator:

posted by flabdablet at 11:53 AM on December 25, 2013 [2 favorites]

Ahhh! so this is wrong:

because a length-x streak is actually a length x+1 pattern consisting of a probability-0.495 event followed by x probability-0.505 events, for an overall probability of 0.495 * 0.505

So let's do the derivation again.

The probability that any given flip will be the last of a length-x streak is 0 for flips 1..x-1, or 0.505

That means that the probability of any given flip

The probability that

Call this p, and solve for y given p:

a

y - x = log

y - x = log p / log a

y = x + log p / log (1 - 0.495 * 0.505

If I now put x = 14 and p = 0.5, y comes out to 19973; for p = 0.005, y is 152576. These are consistent with the results I'm seeing from the simulator:

$ cat flipper.c #include <stdio.h> #include <stdlib.h> #include <time.h> #define HEADS_PROB 0.505 #define LENGTH 14 int main() { srand(time(0)); int tails_min = (RAND_MAX + 1.0) * HEADS_PROB; int trial; for (trial = 1; trial <= 1000000; ++trial) { int flips = 0; int streak = 0; while (streak < LENGTH) { ++flips; if (random() < tails_min) ++streak; else streak = 0; } printf("%d\n", flips); } return 0; } $ gcc -Wall -O3 -o flipper flipper.c $ ./flipper | sort -n >trials.txt $ tail -n +500000 trials.txt | head -n 1 19981 $ tail -n +995000 trials.txt | head -n 1 152563

posted by flabdablet at 11:53 AM on December 25, 2013 [2 favorites]

Bah. srand(time(0)) should be srandom(time(0)). The stdlib I'm using apparently uses random() for rand() and srandom() for srand() anyway, but you can't rely on that.

posted by flabdablet at 12:59 PM on December 25, 2013

posted by flabdablet at 12:59 PM on December 25, 2013

You need Markov chains because you want all of the last 17 trials to be successes, not 17 of the n trials to be successes. In other words, there's a state.

posted by esprit de l'escalier at 2:43 PM on December 25, 2013

My answer: You should expect to wait 28,779 coin flips before your run of 14 or more heads begins.

My tediously long explanation: Your raw data is a sequence of heads and tails, which might look like this

H H H T H T H T H T T T

Each of these 12 events is a single coin flip, each of which is an independent sample from a Bernoulli distribution. But that's not the best way to think about it. To calculate the answer, you need to carve up into a sequence of runs, each of which consists of some number of heads followed by a single tail:

HHHT HT HT T T

Now we have rewritten the data as 4 events. Each of these is an independent sample from a geometric distribution. A draw from a geometric distribution is defined to be a run of (say) heads followed by a single tail, and so we can rewrite this in terms of the runs:

4 2 2 1 1

In order to find out the thing we're really interested in (number of*flips* before the long streak), it helps to find the number of *streaks* you would expect to encounter before the first streak of length 15 or higher (14 heads + 1 tail = length 15).

The number of streaks itself turns out to be a sample from a geometric distribution. Let*p* denote the probability of the coin coming up **tails**. Then the probability that any particular streak of heads (plus the one tail at the end) will turn out to be of length *k* is given by the probability mass function for the geometric distribution

P(*k*) = (1-*p*)^{k-1} *p*

and the probability that any particular streak will be of length less than some desired run length*r* is (a tiny modification of) the cumulative distribution function for the geometric distribution:

P(*k* < *r*) = 1 - (1-*p*)^{r}

To "simplify" things slightly, let's let*s* refer to the probability that a particular streak turns out to be long enough to satisfy you (i.e., it's of length *r* or more). Using the formula above, this is pretty simple:

*s* = 1- P(*k* < *r*)

= 1- ( 1 - (1-*p*)^{r} )

= (1-*p*)^{r}

and so the probability that you encounter exactly*x* streaks before obtaining your first streak of length *r* or higher is

P(*x*) = (1-*s*)^{x} *s*

which, if we substitute the formula for*s* above, becomes

P(*x*) = (1-(1-*p*)^{r})^{x} (1-*p*)^{r}

Okay, that's nice and all, but it's not quite what you asked for. Fortunately, the connection with the geometric distribution gives us what you need. The*expected* number of streaks E[*x*] before getting one that is long enough is given by the mean of the geometric distribution, which is

E[*x*] = 1/*s*

which, if we again express this in terms of the original raw probability of getting a head, becomes

E[*x*] = (1-*p*)^{-r}

But that's only the expected number of (too short) streaks before obtaining a streak of length*r*. What you want is the expected number of flips that occur before the long streak arrives. Because all of those earlier streaks are known to be of length less than *r* what you need for this is the expected value of a truncated geometric distribution. I'm sure there's a neater way of calculating it, but the expected length of a streak, where that streak is known to be shorter than *r* is given by the "conditional expected value",

E[*k* | *k* < *r* ] = Σ_{k} (1-*p*)^{k-1} *p k*

where the sum is taken from*k = 1* to *k = r*-1. Since we are expecting to observe E[*x*] streaks, each of which is expected to consist of E[ *k* | *k* < *r* ] flips, the expected number of flips prior to observing a streak of length *r* or higher is just the product of the two:

(1-*p*)^{-r} Σ_{k} (1-*p*)^{k-1} *p k*

Finally, we have an answer. Just substitute a tails probability of*p* =.495 and a run length of *r* = 15 into this equation... and it turns out you can expect to wait 28779 coin flips before the streak of 14 (or more) heads begins.

For anyone who wants to check my working (because obviously that's going to happen), here's the R scripts that I used to generate the answer and also to compare the analytic version above to some simulations...*r* and a few different values of *p*. They match pretty much exactly, but you need to use a very large value for *n* to make the simulations accurate, because the sampling distribution for the number of flips prior to the streak beginning is very skewed. For a tail probability of *p*=0.5 and a required length of *r*=9, my formula predicts an expected wait time of 502. I ran the *compareVersions* function above multiple using *n*=10,000 simulated sequences to numerically simulate the expected wait time. Across several repetitions of the same simulation I got answers of 500, 498, 509, 506, etc. Smaller values of *n* were pretty unreliable. Ramping it up to *n*=100,000 gave an answer of 501.91.

Given that, I kind of trust my answer, but it's not quite in agreement with flabdablet's answer above. It's quite possible that our*simulations* are actually in agreement: the fact that flabdablet's simulations (which seem to be getting numbers around the 20,000 mark) are producing numbers that are below the expected value of 28,779 is to be expected. *Most* of the time you should see the long streak arrive well before the 28,779 mark. But very occasionally you might be forced to wait for 60,000 flips or something. These outlier events have a very strong effect on the *average* wait time. This is why I think that flabdablet's simulations are consistent with my answer, even though the numbers listed are all smaller than 28,779

However, although our simulations seem consistent, we haven't arrived at quite the same analytic solution. I think the difference in our approaches comes from this part of flabdablet's answer:

*The probability that none of flips x+1..y will be the last of a length-x streak is the probability that all of them will fail to be so, which is (1 - 0.495 * 0.505x)y-x.*

I don't think this is quite right, because it assumes that the probability-of-being-a-length-x-streak-terminator for any coin flip is independent of the preceding flips (hence the raising it to the power of y-x) which isn't quite true. To put it crudely, if the last coin flip was a tail, the probability of the current flip terminating a streak of four heads is zero. Whereas if the last coin flip was a head, the probability is non-zero. More generally, even though the coin flips themselves are uncorrelated, the length-x-streak-termination event associated with flip i*is* correlated with the length-x-streak-termination event for flip i+1, because they both depend on the some of the *same* preceding outcomes. This induces a correlation that needs to be taken into account when computing the expected wait time. Viewed in this way, flabdablet's formula gives an approximate answer that ignores this correlation.

Anyway... like I said, I think the answer is 28,779.

posted by mixing at 2:52 PM on December 25, 2013 [5 favorites]

My tediously long explanation: Your raw data is a sequence of heads and tails, which might look like this

H H H T H T H T H T T T

Each of these 12 events is a single coin flip, each of which is an independent sample from a Bernoulli distribution. But that's not the best way to think about it. To calculate the answer, you need to carve up into a sequence of runs, each of which consists of some number of heads followed by a single tail:

HHHT HT HT T T

Now we have rewritten the data as 4 events. Each of these is an independent sample from a geometric distribution. A draw from a geometric distribution is defined to be a run of (say) heads followed by a single tail, and so we can rewrite this in terms of the runs:

4 2 2 1 1

In order to find out the thing we're really interested in (number of

The number of streaks itself turns out to be a sample from a geometric distribution. Let

P(

and the probability that any particular streak will be of length less than some desired run length

P(

To "simplify" things slightly, let's let

= 1- ( 1 - (1-

= (1-

and so the probability that you encounter exactly

P(

which, if we substitute the formula for

P(

Okay, that's nice and all, but it's not quite what you asked for. Fortunately, the connection with the geometric distribution gives us what you need. The

E[

which, if we again express this in terms of the original raw probability of getting a head, becomes

E[

But that's only the expected number of (too short) streaks before obtaining a streak of length

E[

where the sum is taken from

(1-

Finally, we have an answer. Just substitute a tails probability of

For anyone who wants to check my working (because obviously that's going to happen), here's the R scripts that I used to generate the answer and also to compare the analytic version above to some simulations...

expectedWaitTime <- function( p, r ){ (1-p)^-r * sum( (1-p)^(1:(r-1)) * p * (1:(r-1)) ) } simulateOnce <- function( p, r ) { k <- 0 l <- 0 # length of the sequence so far while( k < r ) { l <- l+k # add the last one k <- rgeom(1,p) # sample a new streak of heads k <- k+1 # add the tail (because of how R implements geometric distribution) } return(l) } simulateMany <- function( p, r, n) { L <- vector(length=n) for( i in 1:n) L[i] <- simulateOnce( p, r ) return(L) } compareVersions <- function( p, r, n=1000 ){ k1 <- expectedWaitTime( p, r ) L <- simulateMany( p, r, n ) k2 <- mean(L) cat( "analytic mean: ", k1, "\n" ) cat( "simulated mean: ", k2, "\n" ) hist( L ) }I verified that the analytic and simulated versions match for small values of

Given that, I kind of trust my answer, but it's not quite in agreement with flabdablet's answer above. It's quite possible that our

However, although our simulations seem consistent, we haven't arrived at quite the same analytic solution. I think the difference in our approaches comes from this part of flabdablet's answer:

I don't think this is quite right, because it assumes that the probability-of-being-a-length-x-streak-terminator for any coin flip is independent of the preceding flips (hence the raising it to the power of y-x) which isn't quite true. To put it crudely, if the last coin flip was a tail, the probability of the current flip terminating a streak of four heads is zero. Whereas if the last coin flip was a head, the probability is non-zero. More generally, even though the coin flips themselves are uncorrelated, the length-x-streak-termination event associated with flip i

Anyway... like I said, I think the answer is 28,779.

posted by mixing at 2:52 PM on December 25, 2013 [5 favorites]

Oh, and if you're interested in the *distribution* of wait lengths, here's a picture. Even though the *average* wait length is 28,779 flips in general (or 28,070 for the sample of 1000 wait lengths I used to generate it), the median wait length is only 19,324: there's a 50% chance that you'd see the streak in less than 19,324 flips. In fact, there's a 5% chance you'd encounter a streak that long after a mere 1,526 flips. But there's a 5% chance that you'd have to wait 83,134 flips or more. Of course, because this is based on only 1000 simulated sequences, there's a fair amount of inaccuracy in these numbers. But you get the idea.

posted by mixing at 3:23 PM on December 25, 2013

posted by mixing at 3:23 PM on December 25, 2013

As you can tell by the widely varying answers given here, AskMeFi is not a great place to get an answer to this kind of question. I suggest asking on Cross Validated, the StackExchange site for statistics questions. You will have a much higher probability (haha!) of getting a correct answer there than here.

posted by number9dream at 4:07 PM on December 25, 2013

posted by number9dream at 4:07 PM on December 25, 2013

Yes, ask on crossvalidated. But, I think you will find that if you want the hitting time, the answer is 28792.9. I have uploaded the four lines of code that calculate that and the stochastic solution that proves it here.

posted by esprit de l'escalier at 4:14 PM on December 25, 2013 [1 favorite]

posted by esprit de l'escalier at 4:14 PM on December 25, 2013 [1 favorite]

mixing, you were close, but I think you forgot to add the amount of time it takes to successfully go from the zero state to the success state, which explains why my answer is 14 higher than yours.

(Actually, I get 28794) with the following:

failure_p = sum(p ** i * (1 - p) for i in range(k))

cycle_time = sum(p ** i * (1 - p) * (i + 1) for i in range(k)) / failure_p

expected_cycles = 1 / (1 - failure_p)

print(cycle_time * expected_cycles + k)

posted by esprit de l'escalier at 4:18 PM on December 25, 2013

(Actually, I get 28794) with the following:

failure_p = sum(p ** i * (1 - p) for i in range(k))

cycle_time = sum(p ** i * (1 - p) * (i + 1) for i in range(k)) / failure_p

expected_cycles = 1 / (1 - failure_p)

print(cycle_time * expected_cycles + k)

posted by esprit de l'escalier at 4:18 PM on December 25, 2013

Yep. But it's 14 higher, right, not 13? My calculations (deliberately!) omitted the length of the successful streak itself, so mine should be 14 below yours. I get an answer of 28778.9. Adding 14 gives 28792.9, same as yours. (EDIT: ah, I see you edited as I was typing. Okay, all good)

posted by mixing at 4:31 PM on December 25, 2013

posted by mixing at 4:31 PM on December 25, 2013

mixing, how did you generate your equations, like this one: E[ k | k < r ] = Σk (1-p)k-1 p k (which you can see loses all your nice formatting when being cut-and-pasted).

posted by benito.strauss at 5:01 PM on December 25, 2013

posted by benito.strauss at 5:01 PM on December 25, 2013

Most of them I took straight from the Wikipedia page on the geometric distribution. In most cases they're just a straightforward application of probability theory. For instance, the probability of seeing exactly *k-1* heads before a tail (streak length *k* in my notation) when all the coin flips are independent is calculated by multiplying all the independent probabilities, giving the geometric probability:

(1-*p*)^{k-1} *p*

And of course this multiplication is exactly how the probability mass function for the geometric distribution is derived.

The one you referred to is probably the most obscure one I used, but it's not complicated. E[*k* | *k*<*r* ] refers to the average length of a streak, where the streak cannot be of length *r* or longer (note that I defined the length of the streak of heads to include the tail that terminates it, so the minimum length is 1 tail + 0 heads = 1). The average is computed as

1 * (probability that the streak is length 1)

+ 2 * (probability that the streak is length 2)

+ ...

+*r*-1 * (probability that the streak is of length *r*-1)

or, equivalently,

Σ_{k} *k* * (probability that a streak is length *k*)

where we sum from*k*=1 to *k*=*r*-1. But since the probability of a length-*k* streak is given by the geometric probability listed above, we can substitute that formula and obtain

Σ_{k} *k* (1-*p*)^{k-1} *p*

which is the formula I used.

Aside: while I'm here, I suppose I should note that my approach is slightly different to the one that esprit de l'escalier suggested. I've tried to think about the raw data in terms of a sequence of independent draws from a geometric distribution. By carving up the data that way you don't need to have a "memory" for previous states. The Markov chain approach described here doesn't explicitly carve up the raw data into geometric-distributed streaks, but by keeping a running tally of the current streak length via the Markov state, and then working out the solution by computing the hitting time for the "14-heads" state of that chain, it does the same thing. Arguably, esprit de l'escalier's approach is more elegant, since it can be generalised to a broader range of problems than my solution.

posted by mixing at 5:39 PM on December 25, 2013 [1 favorite]

(1-

And of course this multiplication is exactly how the probability mass function for the geometric distribution is derived.

The one you referred to is probably the most obscure one I used, but it's not complicated. E[

1 * (probability that the streak is length 1)

+ 2 * (probability that the streak is length 2)

+ ...

+

or, equivalently,

Σ

where we sum from

Σ

which is the formula I used.

Aside: while I'm here, I suppose I should note that my approach is slightly different to the one that esprit de l'escalier suggested. I've tried to think about the raw data in terms of a sequence of independent draws from a geometric distribution. By carving up the data that way you don't need to have a "memory" for previous states. The Markov chain approach described here doesn't explicitly carve up the raw data into geometric-distributed streaks, but by keeping a running tally of the current streak length via the Markov state, and then working out the solution by computing the hitting time for the "14-heads" state of that chain, it does the same thing. Arguably, esprit de l'escalier's approach is more elegant, since it can be generalised to a broader range of problems than my solution.

posted by mixing at 5:39 PM on December 25, 2013 [1 favorite]

Just occurred to me... did you just mean formatting? If so, it was just a bunch of sup and sub tags for superscripts and subscripts, and using Σ for the summation sign.

posted by mixing at 5:53 PM on December 25, 2013

posted by mixing at 5:53 PM on December 25, 2013

My approach was to consider the fifteen states to be a Markov chain and evaluate the probability (failure_p) and hitting time (cycle_time) of returning to state 0 before hitting state 14. Then, I considered a geometric distribution with failure probability (failure_p) and found its expected value, which I multiplied by cycle_time. After all those failed cycles, we finally have a set of flips of 14 successful flips to take us to the finish.

posted by esprit de l'escalier at 6:51 PM on December 25, 2013

posted by esprit de l'escalier at 6:51 PM on December 25, 2013

(Yeah, I was curious about the formatting. I'll do the <sup> <sub> tags sometimes, but I get tired of it. You formatted so many formulas I assumed you had some better way. Turns out you're just not a lazy bastard like me. Thanks.)

posted by benito.strauss at 8:08 PM on December 25, 2013

posted by benito.strauss at 8:08 PM on December 25, 2013

I just ran another million trials, and by sorting the resulting flip counts into ascending order I see that a 14-heads streak arrived before 28779 flips in 632136 of those.

Plugging p = (1000000 - 632136) / 1000000 = 0.367864 into my formula above yields a flip count of 28810. Conversely, plugging 28779 into the inverse formula yields a probability of not seeing a 14-streak before then of 0.368255. These numbers agree with the simulation results to three significant figures, which is good enough for me.

Here's the code and here are the sorted flip counts for that set of trials.

posted by flabdablet at 10:55 PM on December 25, 2013

Actually it seems to me that it's perfectly true.

Consider any arbitrary series of 15 coin flip results. What is the probability, before any of them have been performed, that they will form the pattern THHHHHHHHHHHHHH?

The flips are perfectly independent, so we can get that result by multiplying the probabilities that

- the first flip will fall tails
- the second flip will fall heads
- the third flip will fall heads
- the fourth flip will fall heads
- ...
- the fifteenth flip will fall heads

In particular, it holds for

In a series of y flips, there are y-x sub-series of length x+1. Each of those ends with a single flip. So I can see nothing wrong with saying that the probability that each such flip

posted by flabdablet at 12:43 AM on December 26, 2013 [1 favorite]

So apparently what I'm describing is the Martingale Betting System essentially. Quite the spirited discussion here, and I appreciate everyone's answers for determine how "long" it may take before one could reasonably expect hitting 14 in a row.

I ran a simulation and essentially hit that somewhere around 22,000 flips, so not too far off, people-who-calculated 27,000.

posted by disillusioned at 2:54 AM on December 26, 2013

I ran a simulation and essentially hit that somewhere around 22,000 flips, so not too far off, people-who-calculated 27,000.

posted by disillusioned at 2:54 AM on December 26, 2013

Ah, I've figured it the resolution of my "disagreement" with flabdablet's approach. The correlation does matter in principle, but in practice it's very very weak.

Consider a sequence of 4 flips, and a very simple version of the OPs problem where we're thinking about runs of only 2 heads. I'll call the first flip A, the second B, the third C and the fourth D. We're completely in agreement that the value of any flip conveys no information about the value of any other flip. That's the definition of independence, and it's required in the OPs problem. So, the events C and D are unrelated.

Now consider the two events C* and D*,

C* = 1 if coin flip C is a tail that terminates a run of 2 or more heads

C* = 0 otherwise

D* = 1 if coin flip D is a tail that terminates a run of 2 or more heads

D* = 0 otherwise

It is obvious that C* and D* are correlated. I think we're in agreement on that, but I'll state the obvious anyway... if the probability of a tail is*p* then the prior probability that C* = 1 is the same as the prior probability that D* = 1, since in both cases the prior probability is just *p(1-p)*^{2}. It is equally obvious, however, that they are correlated. If C terminates a run of 2 or more heads, then by definition flip C is a tail and hence D cannot be a terminating flip. So

P( D*=1 | C*=1 ) = 0

which is clearly not the same thing as the prior probability P(D* = 1). Since the conditional probability distribution over D* given C*=1 is not the same as the marginal (or prior) distribution over D*, C* and D* are not independent (by the definition of independence).

The question of substance, however, is how much*does that matter for the OPs problem*? I think the answer here is "not really". Because we're only interested in the first terminating event, we're only ever interested in the conditional probability that D terminates the streak given that C *does not* terminate the streak. How much information does C*=0 convey about the value of D*? I think the answer is "not much". The table below shows all possible values of A, B, C and D, as well as the corresponding values of C* and D*:
*p=.5* so all 16 sequences are equally likely (the OPs problem is close enough to this). There are only 2 of 16 sequences in which D*=1, so the prior probability that D terminates is .125. Suppose that we know that C is not a run-terminating flip? There are only two cases where C*=1, and in both of those cases (of course) D does not terminate. That means that there are 2 of 14 sequences in which D*=1 given that C*=0. So the conditional probability of D terminating given that C does not terminate is .143. Now, .125 isn't the same as .143. In this toy example, knowing that C*=0 does have some influence on the probability that D*=1. But this is an upper bound on how strong the correlation can get: when the required run length gets very long, the correlation starts to have a negligible effect.

To see this, suppose the required run length is*k* heads, and consider a sequence of length *k*+2. Let X be the event that "flip *k* + 1 is a tail that terminates a run of *k* or more heads", and let Y be the even that "flip *k* + 2 is a tail that terminates a run of *k* or more heads".

There are 2^{k+2} possible outcomes for this set of *k*+2 flips. There are only 2 such sequences in which X = 1 (i.e., the second to last is a terminator), namely,

HHHH....HHHH TH, and

HHHH....HHHH TT

Similarly there are only two such sequences for which Y = 1, and they're both different to the ones in which X = 1:

HHHH....HHHH HT, and

THHH....HHHH HT

If we restrict ourselves the case when*p*=.5, the prior probability that Y = 1 is 1 / 2^{r+1}. Knowing that X = 0 tells us almost nothing: the conditional probability that Y = 1 given that X = 0 is 1 / ( 2^{r+1} -1). For larger values of *r*, this is a negligible difference. For *r* = 15, as per the OPs problem, the ratio of the two probabilities is 1.000031. That's not going to make any meaningful difference to the calculations.

So we have a happy ending: we're both right. I am correct to say that the termination probabilities are not independent and that this does change the answer. But you are correct in saying that you can ignore this dependence. It doesn't make a big difference to the calculations, because successive "termination events" are very close to being independent, especially when we're talking about the case that matters to the OP.

posted by mixing at 3:12 AM on December 26, 2013

Consider a sequence of 4 flips, and a very simple version of the OPs problem where we're thinking about runs of only 2 heads. I'll call the first flip A, the second B, the third C and the fourth D. We're completely in agreement that the value of any flip conveys no information about the value of any other flip. That's the definition of independence, and it's required in the OPs problem. So, the events C and D are unrelated.

Now consider the two events C* and D*,

C* = 1 if coin flip C is a tail that terminates a run of 2 or more heads

C* = 0 otherwise

D* = 1 if coin flip D is a tail that terminates a run of 2 or more heads

D* = 0 otherwise

It is obvious that C* and D* are correlated. I think we're in agreement on that, but I'll state the obvious anyway... if the probability of a tail is

P( D*=1 | C*=1 ) = 0

which is clearly not the same thing as the prior probability P(D* = 1). Since the conditional probability distribution over D* given C*=1 is not the same as the marginal (or prior) distribution over D*, C* and D* are not independent (by the definition of independence).

The question of substance, however, is how much

A B C D C* D* [1,] H H H H 0 0 [2,] H H H T 0 1 [3,] H H T H 1 0 [4,] H H T T 1 0 [5,] H T H H 0 0 [6,] H T H T 0 0 [7,] H T T H 0 0 [8,] H T T T 0 0 [9,] T H H H 0 0 [10,] T H H T 0 1 [11,] T H T H 0 0 [12,] T H T T 0 0 [13,] T T H H 0 0 [14,] T T H T 0 0 [15,] T T T H 0 0 [16,] T T T T 0 0For simplicity, suppose

To see this, suppose the required run length is

There are 2

HHHH....HHHH TH, and

HHHH....HHHH TT

Similarly there are only two such sequences for which Y = 1, and they're both different to the ones in which X = 1:

HHHH....HHHH HT, and

THHH....HHHH HT

If we restrict ourselves the case when

So we have a happy ending: we're both right. I am correct to say that the termination probabilities are not independent and that this does change the answer. But you are correct in saying that you can ignore this dependence. It doesn't make a big difference to the calculations, because successive "termination events" are very close to being independent, especially when we're talking about the case that matters to the OP.

posted by mixing at 3:12 AM on December 26, 2013

So if that's right, I should be able to run another million simulated trials for x = 2 and get much worse agreement between the results and what my formula predicts. Let's see:

For p = 0.5, formula says y = 2 + log p / log (1 - 0.495 * 0.505^{2}) = 7.13

Number of trials where y < 7: 680333; y < 8: 742541. Which is certainly not 500000 even to*one* significant figure.

If I plug y = 7 into the inverse formula, I get p = (1 - 0.495 * 0.505^{2})^{5} = 0.509 which is once again nowhere near 1 - 680333 / 1000000.

I can argue with you, but I can't argue with the facts :-)

posted by flabdablet at 4:56 AM on December 26, 2013

For p = 0.5, formula says y = 2 + log p / log (1 - 0.495 * 0.505

Number of trials where y < 7: 680333; y < 8: 742541. Which is certainly not 500000 even to

If I plug y = 7 into the inverse formula, I get p = (1 - 0.495 * 0.505

I can argue with you, but I can't argue with the facts :-)

posted by flabdablet at 4:56 AM on December 26, 2013

disillusioned, the answer is 28794, not 22000. Run more simulations.

posted by esprit de l'escalier at 7:42 AM on December 26, 2013 [1 favorite]

posted by esprit de l'escalier at 7:42 AM on December 26, 2013 [1 favorite]

mixing, what are you talking about? I'm sure that my analysis is correct and you are only confusing yourself with this…

posted by esprit de l'escalier at 7:50 AM on December 26, 2013

posted by esprit de l'escalier at 7:50 AM on December 26, 2013

actually, I made an error, after all the simplifications, I get

hitting time is: (1 - p**k) / (p**k * (1-p))

giving 28793

I verified the formula against simulations for different values of p and k.

posted by esprit de l'escalier at 11:16 AM on December 26, 2013

hitting time is: (1 - p**k) / (p**k * (1-p))

giving 28793

I verified the formula against simulations for different values of p and k.

posted by esprit de l'escalier at 11:16 AM on December 26, 2013

You are not logged in, either login or create an account to post comments

posted by Obscure Reference at 5:10 AM on December 25, 2013