Comments on: A variation on the birthday paradox
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox/
Comments on Ask MetaFilter post A variation on the birthday paradoxWed, 14 Jul 2010 17:38:34 -0800Wed, 14 Jul 2010 17:38:34 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: A variation on the birthday paradox
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox
I have 232 facebook friends, and six of them share a birthday in common (not 6 different pairs of people on the same birthday, but 6 out of the 232 were born on the 17th of July. Now, I know it only takes 23 random people to get a 50% chance of at least ONE birthday collision, but how the heck do I figure out the odds on this one? Is this a significant anomaly, or reasonably expected? My one stats class was entirely too long ago....post:ask.metafilter.com,2010:site.159519Wed, 14 Jul 2010 17:32:18 -0800um_maverickbirthdayparadoxstatisticsmathBy: ishotjr
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287542
This <a href="http://ask.metafilter.com/32404/Commonuncommon-birthdays">previous question</a> might be of some help as far as the links go.comment:ask.metafilter.com,2010:site.159519-2287542Wed, 14 Jul 2010 17:38:34 -0800ishotjrBy: ishotjr
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287543
Especially<a href="http://ask.metafilter.com/32404/Commonuncommon-birthdays#506667"> this</a> comment. It seems that birthdays may be clustered around July, August, and September. I have a disproportionate amount of FB friends with late June, early July, and March birthdays.comment:ask.metafilter.com,2010:site.159519-2287543Wed, 14 Jul 2010 17:40:07 -0800ishotjrBy: Kimberly
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287546
My husband put the wrong date in his profile because he "doesn't want Facebook to know the real one" and he put it in July.comment:ask.metafilter.com,2010:site.159519-2287546Wed, 14 Jul 2010 17:42:53 -0800KimberlyBy: roomthreeseventeen
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287547
Yup, I have this trend, too, in May/June/July. Six birthdays today (out of 820 friends). Usually it's one or two.comment:ask.metafilter.com,2010:site.159519-2287547Wed, 14 Jul 2010 17:43:13 -0800roomthreeseventeenBy: um_maverick
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287586
Hmm, maybe my question was misleading - I'm not looking for whether or not july birthdays are common, but the formula for "given 232 friends, assuming random distribution (even though we now know this isn't the case) what are the odds of 6 of them sharing one birthday"<br>
<br>
Thanks!comment:ask.metafilter.com,2010:site.159519-2287586Wed, 14 Jul 2010 18:08:53 -0800um_maverickBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287611
Well, if you want a theoretical answer, assuming each of 365 days of the year is equally likely as a birthday (pretending leap years don't exist), it's not a trivial question. But the analysis would go something like this.<br>
<br>
Let P(a<sub>0</sub>, a<sub>1</sub>, a<sub>2</sub>, a<sub>3</sub>...) represent the probablity that, given a<sub>1</sub>+2*a<sub>2</sub>+3*a<sub>3</sub>... people, their birthdays are distribitued such that there are a<sub>0</sub> days on which no people have birthdays, a<sub>1</sub> days on which exactly one person has a birthday, a<sub>2</sub> days on which exactly 2 people have birthdays, etc. We have the constraint that a<sub>0</sub>+a<sub>1</sub>+a<sub>2</sub>... = 365.<br>
<br>
For example, P(359, 4, 1) is the probability that given six people, two of them will share a birthday, and the other four all have different birthdays (from those two and from each other).<br>
<br>
Start with P(365) = 1. (The trivial case: given 0 people, the probability that there are 365 days on which no one has a birthday is 1.)<br>
<br>
Likewise, P(364, 1) = 1. (Given one person, the probability that there is one day which is that person's birthday, and 364 days which are no one's birthday, is 1. This can actually be derived from the previous case and the rules I'm about to lay down, but it may be easier to list separately since it's also a trivial case.)<br>
<br>
Now, given N people, the probabilities for each possible case can be derived from the probabilities for N-1 people as follows:<br>
<br>
P(a<sub>0</sub>, a<sub>1</sub>... a<sub>k</sub>-1, a<sub>k+1</sub>+1...) = P(a<sub>0</sub>, a<sub>1</sub>... a<sub>k</sub>, a<sub>k+1</sub>...)*a<sub>k</sub>/365.<br>
<br>
Except that a given set of probabilities for N people may be "generated" from a set of N-1 people in multiple ways, in which case you have to add up the probabilities of all those possible ways. Also, if a<sub>k+1</sub> didn't exist in the previous set, pretend it was there and was zero. (Essentially, any finite string of a<sub>n</sub>'s can actually be thought of as infinite, with all the unlisted a<sub>n</sub>'s as zero.)<br>
<br>
So proceeding, the possibilities for 2 people are:<br>
<br>
P(363, 2) [different birthdays] = P(364,1)*364/365 = 364/365<br>
<br>
P(364, 0, 1) [same birthday] = P(364,1)*1/365 = 1/365<br>
<br>
For 3 people:<br>
<br>
P(362,3) [all three different birthdays] = P(363,2)*363/365 = 132132/133225<br>
<br>
P(363,1,1) [two of the three share a birthday] = P(363,2)*2/365 + P(364, 0, 1)*364/365 = 1092/133225 [note this is the first case where we had to add multiple terms from the cases with N-1 people]<br>
<br>
P(364,0,0,1) [all three share the same birthday] = P(364,0,1)*1/365 = 1/133225<br>
<br>
Continue this procedure until you've analyzed the cases for N=232, and add up all the probabilities for which one of a<sub>6</sub>, a<sub>7</sub>, a<sub>8</sub>... is greater than zero, and that will give you the probability that given 232 people, at least six will share the same birthday.<br>
<br>
I might try to program this later, but if anyone else wants to take a shot at it first, that's fine with me. Alternately, a Monte Carlo simulation would be a lot easier to program, although it's accuracy will of course be based on how many trials you run.comment:ask.metafilter.com,2010:site.159519-2287611Wed, 14 Jul 2010 18:19:55 -0800DevilsAdvocateBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287669
Figured out a way to do it without that clumsy recursion:<br>
<br>
Think of the problem this way: imagine you have a one-dimensional board. The left-most space is 0, and to the right of it, the spaces are numbered 1, 2, 3, etc. The board extends infinitely to the right.<br>
<br>
Start with 365 tokens on space 0. Each "move" consists of selecting 1 token at random, and moving it one space to the right. If you make 232 moves, what is the probability that at least one token has reached space 6 or greater?<br>
<br>
The approach is to solve the converse problem—what is the probability that no token has moved beyond space 5—then subtract that probability from 1. (Which is akin to the way you solve the single collision problem: figure out what the probability is that no two people share a birthday, then subtract that from 1.)<br>
<br>
So, list all possible distributions of tokens such that no token is beyond space five. I.e., list all solutions, in non-negative integers, of a<sub>0</sub>, a<sub>1</sub>... a<sub>5</sub> given the restrictions:<br>
a<sub>0</sub>+a<sub>1</sub>+a<sub>2</sub>+a<sub>3</sub>+a<sub>4</sub>+a<sub>5</sub>=365<br>
a<sub>1</sub>+2*a<sub>2</sub>+3*a<sub>3</sub>+4*a<sub>4</sub>+5*a<sub>5</sub>=232<br>
<br>
For each of these, P(a<sub>0</sub>, a<sub>1</sub>... a<sub>5</sub>) = (number of ways of selecting specific tokens to fill those spaces) * (number of "orders" in which the 232 moves could have occurred) * probability of a specific sequence of moves occurring.<br>
<br>
Ways of selecting tokens to fill those spaces = 365!/a<sub>0</sub>!a<sub>1</sub>!a<sub>2</sub>!a<sub>3</sub>!a<sub>4</sub>!a<sub>5</sub>!<br>
<br>
Number of arrangements of moves, given a specific set of tokens in each space = 232!/(2!<sup>a<sub>2</sub></sup> * 3!<sup>a<sub>3</sub></sup> * 4!<sup>a<sub>4</sub></sup> * 5!<sup>a<sub>5</sub></sup>)<br>
<br>
Probability of a specific sequence of moves occurring = 1/365<sup>232</sup><br>
<br>
Thus, P(a<sub>0</sub>,a<sub>1</sub>,a<sub>2</sub>,a<sub>3</sub>,a<sub>4</sub>,a<sub>5</sub>) = (365!/a<sub>0</sub>!a<sub>1</sub>!a<sub>2</sub>!a<sub>3</sub>!a<sub>4</sub>!a<sub>5</sub>!) * (232!/(2!<sup>a<sub>2</sub></sup> * 3!<sup>a<sub>3</sub></sup> * 4!<sup>a<sub>4</sub></sup> * 5!<sup>a<sub>5</sub></sup>)) / 365<sup>232</sup><br>
<br>
Add up the P values for each of the possible solutions for a<sub>0</sub>...a<sub>5</sub>, and that gives you the probability that no token has moved beyond space 5, i.e., that no more than 5 people share any birthday. Subtract that from 1 to get the probability that at least 6 people share at least one birthday.<br>
<br>
I'll see if I can work that out...comment:ask.metafilter.com,2010:site.159519-2287669Wed, 14 Jul 2010 18:56:32 -0800DevilsAdvocateBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287674
Slight correction to my first comment:<br>
<br>
<i>For example, P(<s>359</s> </i><b>360</b><i>, 4, 1) is the probability that given six people, two of them will share a birthday, and the other four all have different birthdays (from those two and from each other).</i>comment:ask.metafilter.com,2010:site.159519-2287674Wed, 14 Jul 2010 18:59:04 -0800DevilsAdvocateBy: a robot made out of meat
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287713
Like lots of hard probability problems, it's easier to simulate.<br>
<br>
> x<-round(runif(n=232*10000)*364)<br>
> x<-matrix(x,nrow=10000)<br>
> y<-apply(x,1,sort)<br>
> z<-apply(y,2,function(x){max(rle(x)$lengths)})<br>
> table(z)<br>
z<br>
<code><br>
3 4 5 6 7 8 <br>
2156 6163 1485 185 10 1 <br>
</code><br>
<br>
So in 196/10000 cases (about 2%) you had 6 or more friends with the same birthday.comment:ask.metafilter.com,2010:site.159519-2287713Wed, 14 Jul 2010 19:21:36 -0800a robot made out of meatBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287718
<i>So, list all possible distributions of tokens such that no token is beyond space five. I.e., list all solutions, in non-negative integers, of a0, a1... a5 given the restrictions:<br>
a0+a1+a2+a3+a4+a5=365<br>
a1+2*a2+3*a3+4*a4+5*a5=232</i><br>
<br>
If I've done my work right, there are 1,141,886 solutions to these equations.comment:ask.metafilter.com,2010:site.159519-2287718Wed, 14 Jul 2010 19:25:01 -0800DevilsAdvocateBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287740
Oh, I forgot, 365! is on the order of 10<sup>778</sup>, so I need to either work with something that can handle numbers that big, or else use approximations such as <a href="http://en.wikipedia.org/wiki/Stirling%27s_approximation">Stirling's approximation</a>...comment:ask.metafilter.com,2010:site.159519-2287740Wed, 14 Jul 2010 19:42:24 -0800DevilsAdvocateBy: miyabo
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287748
This program does a simulation of the problem as I understand it.<br>
Written in Python, requires Numpy (took way too long to run without it).<br>
<br>
<pre><br>
import random, numpy, numpy.random<br>
runs=100000<br>
friends=232<br>
days=numpy.arange(366)<br>
def bday_sim():<br>
# Array of 232 random numbers<br>
bdays = numpy.random.randint(1, 366, friends)<br>
# For each day of the year, # of friends with that birthday<br>
counts = numpy.histogram(bdays, days)[0]<br>
if counts.max() >= 6: return 1.0<br>
else: return 0.0<br>
<br>
sum=0<br>
for i in range(runs): sum += bday_sim()<br>
<br>
print sum / float(runs)<br>
</pre><br>
<br>
Someone please check if I made any glaring mistakes in that code. I know it doesn't handle leap years.<br>
<br>
Gives me an answer of 0.02 (2% chance of this happening). By contrast, if you increase to 500 friends, there's a 67% chance of this happening.comment:ask.metafilter.com,2010:site.159519-2287748Wed, 14 Jul 2010 19:47:30 -0800miyaboBy: JumpW
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287752
I think you could estimate this with the Poisson distribution (at least, my answers match the simulation answers of 'a robot made out of meat' ! )...<br>
<br>
http://en.wikipedia.org/wiki/Poisson_distribution<br>
<br>
On any average day, you would expect 232/365 = 0.6356 birthdays (ie lambda = 0.6356).<br>
<br>
p(6 on the same day) = 365 * 0.6356^6 * exp( - 0.6356) / 6! = 0.01770461<br>
p(7 on the same day) = 365 * 0.6356^7 * exp( - 0.6356) / 7! = 0.001607620<br>
p(8 on the same day) = 365 * 0.6356^8 * exp( - 0.6356) / 8! = 0.0001277287<br>
<br>
etc... if you sum up these probabilities, there is about a 2% chance that, of 232 friends, 6 or more would have the same birthdaycomment:ask.metafilter.com,2010:site.159519-2287752Wed, 14 Jul 2010 19:48:32 -0800JumpWBy: JumpW
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287753
Oh good, three of us have estimated 2% so far! What could be the probability that we're all wrong?comment:ask.metafilter.com,2010:site.159519-2287753Wed, 14 Jul 2010 19:51:01 -0800JumpWBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287796
I worked through my "exact" solution as described (well, exact as you can get considering the possibility of roundoff error when adding over a million numbers) and got ~0.018470917, just under 2%.<br>
<br>
For the curious, the single most probable distribution was 1 date shared by 4 people, 8 dates shared by 3 people each, 40 dates shared by 2 people each, and 124 people with unique birthdays, with a probability of 0.004728677.comment:ask.metafilter.com,2010:site.159519-2287796Wed, 14 Jul 2010 20:22:43 -0800DevilsAdvocateBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2287836
Silly me. No need to use Stirling's approximation when I'm working in logarithms anyway up to the point where I get ln(P). It's easy enough to calculate ln(N!) directly for N up to 365.<br>
<br>
So, a slight revision to my earlier answer: I now get 0.018318139 for the probability, and 0.004729403 for the most likely single distribution.comment:ask.metafilter.com,2010:site.159519-2287836Wed, 14 Jul 2010 21:03:34 -0800DevilsAdvocateBy: russm
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288021
it's actually much easier than all that...<br>
<br>
for any given group of 6 people, they have a 1/(365^5) chance of all having the same birthday...<br>
<br>
there are 232C6 combinations of 6 people possibe from a pool of 232, which is 232!/227!/6! or alternatively (232*231*...*228*227)/6!<br>
<br>
so the odds of there being at least one group of 6 people sharing a birthday from a pool of 232 is<br>
<br>
(232*231*...*228*227)/(365^5 * 6!) = 0.031320390532716<br>
<br>
so a touch over 3%comment:ask.metafilter.com,2010:site.159519-2288021Thu, 15 Jul 2010 02:48:50 -0800russmBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288072
<i>so the odds of there being at least one group of 6 people sharing a birthday from a pool of 232 is<br>
<br>
(232*231*...*228*227)/(365^5 * 6!) = 0.031320390532716</i><br>
<br>
This would be true only if the possibilities of any two groups of six people (even overlapping groups) having the same birthday were mutually exclusive.<br>
<br>
In other words, yes, it's true that the probability of A, B, C, D, E, and F having the same birthday is 1/365<sup>5</sup>. It's also true that the probability of G, H, I, J, K, and L having the same birthday is 1/365<sup>5</sup>. It is <b>not</b> true that the probability of at least one of the two groups (A, B, C, D, E, and F) and (G, H, I, J, K, and L) all having the same birthday is 2/365<sup>5</sup>.<br>
<br>
<small>It is depressingly common in complex AskMes about probability for people to assume P(A or B)=P(A)+P(B) when A and B are not mutually exclusive events; or that P(A and B)=P(A)*P(B), or P(A or B)=1-(1-P(A))(1-P(B)), when A and B are not independent events.</small>comment:ask.metafilter.com,2010:site.159519-2288072Thu, 15 Jul 2010 05:17:45 -0800DevilsAdvocateBy: um_maverick
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288083
Wow, thanks everybody - my 11-year-old memory of my stats class had me thinking it was far simpler than this. Apparently I was way, way off! Thanks!comment:ask.metafilter.com,2010:site.159519-2288083Thu, 15 Jul 2010 05:47:23 -0800um_maverickBy: King Bee
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288194
DevilsAdvocate: What are you talking about? Those groups are completely different, so the events are independent. Did you mean to have the groups overlap?<br>
<br>
Also (not directed at DA, just at the general public), the words "<a href="http://en.wikipedia.org/wiki/Odds">odds</a>" and "probability" are not interchangeable. In a casual conversation, say what you will, but when discussing a problem like this, we should be more careful.comment:ask.metafilter.com,2010:site.159519-2288194Thu, 15 Jul 2010 07:44:31 -0800King BeeBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288280
<i>What are you talking about? Those groups are completely different, so the events are independent.</i><br>
<br>
Yes, in the example I gave (two disjoint groups of six people each) they are independent. russm was <b>not</b> treating them as independent (i.e., P(A or B) = 1-(1-P(A))(1-P(B)); he was treating them as mutually exclusive (i.e., P(A or B) = P(A)+P(B)).<br>
<br>
russm's logic was: There are G=232!/226!6!=202904412172 distinct groups of 6 people. The probability of any single group of six people having the same birthday is P=1/365<sup>5</sup>. Therefore, the probability of any six people sharing the same birthday is G*P.<br>
<br>
russm's conclusion only follows if each of the G groups of six people sharing the same birthday are <i>mutually exclusive</i> (not <i>independent</i>) events. The fallacy is more easily seen with a simpler situation: given 50 people, what is the probability of any two sharing a birthday? If we followed russm's logic, we find G=50!/48!2! = 1225 distinct pairs of people, each having a 1/365 chance of sharing their birthdays, and the probability of any two people sharing a birthday is 1225/365≈3.356.<br>
<br>
Had russm assumed the events in question were <i>independent</i> rather than <i>mutually exclusive</i>, he would have used the calculation 1-[(1-1/365<sup>5</sup>)<sup>202904412172</sup>] ≈ 0.030834987966224, slightly lower than his earlier answer, but still too high by over 50%.<br>
<br>
As you correctly note, two groups of six people having all six birthdays in common <i>are</i> independent events if the two groups are entirely disjoint (or even if they have just one member in common), but they are not independent if the groups have two or more members in common. E.g., the probability that at least one of the groups (A, B, C, D, E, F) and (B, C, D, E, F, G) has all its members' birthdays in common is <i>not</i> 1-(1-1/365<sup>5</sup>)<sup>2</sup>. Thus it is incorrect to calculate the probability of at least 6 out of 232 people sharing a birthday as if we were considering 202904412172 independent events (not what russm did), and also wrong to calculate it as if we were considering 202904412172 mutually exclusive events (what russm did).<br>
<br>
Good point about "odds." Given that the OP asked for the odds, they are about 53.59:1 against.comment:ask.metafilter.com,2010:site.159519-2288280Thu, 15 Jul 2010 08:22:39 -0800DevilsAdvocateBy: russm
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288375
good point... and having turned away from attempting an analytic solution using my 15 year old engineering stats and instead gone for a simulation I see the answer is ~0.0182, so I'll just shut up now...comment:ask.metafilter.com,2010:site.159519-2288375Thu, 15 Jul 2010 08:54:59 -0800russmBy: King Bee
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288506
Ah, I knew I had to have misunderstood your example, DA. That's what I get for waking up and immediately trying to do mathematics.comment:ask.metafilter.com,2010:site.159519-2288506Thu, 15 Jul 2010 10:09:35 -0800King BeeBy: lex mercatoria
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288644
I love these sorts of problems. I think russm was onto something, but didn't quite get all the way there. <br>
<br>
There are C(232, 6) ways to choose 6 people from 232. The remaining 226 people must then be distributed across the remaining 364 days, which can be done in 364<sup>226</sup> ways. The size of the sample space is 365<sup>232</sup>. So we get<br>
<br>
C(232, 6) * 364<sup>226</sup> * 365<sup>-232</sup> = 4.61593454425e-5<br>
<br>
This is the probability of all six having the same particular birthday (say July 17) in common. If we don't care about the day we multiply by 365 to get 0.01684, or about 1.7%.comment:ask.metafilter.com,2010:site.159519-2288644Thu, 15 Jul 2010 11:35:26 -0800lex mercatoriaBy: DevilsAdvocate
http://ask.metafilter.com/159519/A-variation-on-the-birthday-paradox#2288718
<i>There are C(232, 6) ways to choose 6 people from 232. The remaining 226 people must then be distributed across the remaining 364 days, which can be done in 364<sup>226</sup> ways.</i><br>
<br>
First, that would give you the ways in which <i>exactly</i> 6 people shared a specific birthdate, e.g., that exactly 6 people were born on July 7. I (and apparently most others) took the OP's question to be about the probability that <i>at least</i> 6 people shared a birthdate. Granted, the OP didn't make explicit which he meant, so if that's the question you're answering, that's fine as far as it goes.<br>
<br>
But:<br>
<br>
<i>This is the probability of all six having the same particular birthday (say July 17) in common. If we don't care about the day we multiply by 365 to get 0.01684, or about 1.7%.</i><br>
<br>
You have made the same mistake russm did: the events are not mutually exclusive. "Six people have a birthday on July 17" does not exclude the possibility "six people have a birthday on January 3," thus you cannot simply add the probabilities to get the probability of at least one happening, either. (They're also not independent events.)comment:ask.metafilter.com,2010:site.159519-2288718Thu, 15 Jul 2010 12:10:14 -0800DevilsAdvocate