MathIsHardFilter
March 9, 2025 9:59 AM   Subscribe

What are the odds of 3 people in a randomly selected group of 50 having the same birthday?
posted by Lemkin to Science & Nature (28 answers total) 5 users marked this as a favorite
 
exactly 3 or at least 3?
posted by supercres at 10:26 AM on March 9 [1 favorite]


About 22% if exactly 3 and 7% for 3 or more (meaning not 0, 1, or 2). Note this is probability and not odds.
posted by MisantropicPainforest at 10:32 AM on March 9 [1 favorite]


Not a simple question! See this Stack Exchange.

If you mean 3 or more having the same birthday (and ignoring the fact that birthdays are not equally distributed throughout the year, leap years, etc), then you can use the Poisson approximation in the first answer, which gives 14%. I imagine that would be close enough for most purposes.

There is an exact formula given in an answer below.

Unfortunately, I don't think there is a good way to display these kinds of equations here on MetaFilter.
posted by ssg at 10:47 AM on March 9 [1 favorite]


Did a quick simulation.
import numpy as np

nsim = 1000000
res = np.empty(nsim)
rng = np.random.default_rng()

for i in range(nsim):
    bdays = rng.integers(366, size=50)
    largest_group = np.max(np.bincount(bdays))
    res[i] = largest_group

print(dict(zip(range(6), np.bincount(res.astype(int)))))
Note that this doesn't include leap years, which would be an additional possible integer with 1/4 the probability of the rest.

Out of a million iterations, the number that had the largest group of size X:
{
    1: 29830,
    2: 844469,
    3: 121385,
    4: 4215,
    5: 99
}
So about 12.6%, adding up the 3, 4, and 5 groups. This agrees with the result from Wolfram Alpha.
posted by supercres at 11:09 AM on March 9 [3 favorites]


@supercres: Why are you using 366 in this statement, instead of 365?

> bdays = rng.integers(366, size=50)
posted by akk2014 at 11:17 AM on March 9 [1 favorite]


Ty, got ahead of myself, was already thinking of the leap year case
import numpy as np

nsim = 1000000
res = np.empty(nsim)
rng = np.random.default_rng()

p = np.append(np.ones(365), 0.25)
p_normed = p/sum(p)

for i in range(nsim):
    bdays = rng.choice(366, size=50, p=p_normed)
    largest_group = np.max(np.bincount(bdays))
    res[i] = largest_group

print(dict(zip(range(6), np.bincount(res.astype(int)))))
Similar result, 12.66% this time.
{1: 29753, 2: 843648, 3: 122356, 4: 4161, 5: 78}

posted by supercres at 11:23 AM on March 9 [1 favorite]


not a mathematician but I have two experiences (both back in my school years) where in a group of roughly thirty, at least three of us shared a birthday.
posted by philip-random at 11:50 AM on March 9 [1 favorite]


What about 0 and more than 6?
posted by MisantropicPainforest at 12:06 PM on March 9


The way to figure this out is to determine the probability of NOT sharing a birthday, then flipping it. You can find the formula for 2 people in most intro stats books (and of course online), and modify it for 3 people.
posted by nixxon at 12:07 PM on March 9 [1 favorite]


What about 0 and more than 6?

There's no zero, this is "max size of a birthday sharing group". 1 is "everyone has a different birthday", about 3% of the time. 6 hasn't happened for me even when I rerun 10x as large: {1: 297810, 2: 8439696, 3: 1219878, 4: 41547, 5: 1043} -> 12.62% at 3 or more.

You can find the formula for 2 people in most intro stats books (and of course online), and modify it for 3 people.

The last part is more complicated than one might initially think, since you also have to take into account the cases when two people share a birthday but not 3. See the links provided above.
posted by supercres at 12:17 PM on March 9 [1 favorite]


Response by poster: I should have posed the problem more precisely.

In an average working day, I ask roughly 50 people what their birthday is. If one of them shares my birthday, it feels like a once-per-month (~20 working days) fluke.

Recently I had a shift where two people shared my birthday. That is the first time this has happened in the year I’ve been working there (320 working days).

If it’s really an 11% probability, wouldn’t that make it strikingly overdue?
posted by Lemkin at 12:27 PM on March 9


You're asking a completely different question: not what are the odds that three people share a birthday, but that two people have the same specific birthday as you.

That one is simple to answer. The odds that one person shares your birthday is 50/365 = 14%. The odds that two people share your birthday is 50/365 * 49/365 = 1.8%.
posted by ssg at 12:38 PM on March 9 [14 favorites]


To put that in days as you have, you should get someone who shares your birthday about once every 7 days and two people who share your birthday about every 56 days. So maybe you're asking fewer than 50 people per day? Or maybe you have a birthday that is less common than the average?
posted by ssg at 12:58 PM on March 9


ssg wrote, "The odds that one person shares your birthday is 50/365 = 14%"

I'm not particularly adept at mathematics, but something doesn't seem right about that formula. Suppose you had a sample of 365 people. Your formula suggests that you'd have a 100% chance of having a match. But isn't it possible that nobody in the 365 people had the same birthday as you? So it should really be a little less than 100%.
posted by alex1965 at 1:12 PM on March 9 [1 favorite]


But it’s not a sample of 365 people. It’s you, and then some random person. Your birthday is fixed. So conditioning on it cannot change the other probability. So you have a birthday. Pick a random person. What’s the probability their birthday matches your birthday?
posted by MisantropicPainforest at 1:43 PM on March 9


I see, that 50/365 is an approximation since the real probability is lower since we’re sampling with replacement. So we take 50 samples and that’s probably like 45 distinct birthdays so the real probability is 45/365. The number of samples doesn’t scale linearly with probability. That approximation holds at lower number of samples or higher sample spaces (distinct dates over a decade rather than 365 days)
posted by MisantropicPainforest at 1:55 PM on March 9


Suppose each person you ask has the same probability p of sharing your birthday, and you ask n people. The number of "successes" in the sample follows the binomial distribution.

In this case, n=50, and p≈1/(365.25) (assuming birthdays are evenly distributed in the population, which is not exactly true IRL but close enough). The probability of k successes is C(n,k)pk(1–p)n–k. Calculating this out, we have

P(0 successes) ≈ 87.19%
P(exactly 1 success) ≈ 11.97%
P(2 or more successes) ≈ 0.84%

You should get 2 or more successes about once every 119 days. (Though that assumes you really ask 50 people every day. If you ask an average of 50 people every day, but there's some variability in the number of people you ask, I think that would tend to modestly increase the frequency of getting 2 or more successes on the same day.)
posted by aws17576 at 2:23 PM on March 9 [3 favorites]


Oh, and just for fun, assuming you've really asked 50 people a day every day, the chance that you wouldn't have had this happen during the first 319 days is only 6.7%, so in that sense it really was overdue! My hunch is that you're overestimating how many people you ask.
posted by aws17576 at 2:26 PM on March 9 [2 favorites]


Claude.ai says that if you have a group of 50 people, the probability that at least one person shares your birthday is 1 - (364/365)^50, which is about 12.8%.

Yes, I messed that up and Claude is right.
posted by ssg at 2:45 PM on March 9 [4 favorites]


Continuing the "for fun" exploration of the problem, if you're actually asking 50 people a day, and therefore the probability of this happening on any given day is 0.84%, once you hit about 82-83 days it's more likely that it will have happened in that time vs not.
posted by supercres at 3:27 PM on March 9 [1 favorite]


Birthday’s aren’t uniformly distributed—-the cover of Bayesian Data Analysis 3 has the distribution of birthdays and their estimated probability of occurrence. There’s some spikes and valleys in there.
posted by MisantropicPainforest at 3:36 PM on March 9 [5 favorites]


Here is an article on the uneven distributions of birthdays.
posted by Mid at 3:50 PM on March 9 [3 favorites]


Used the birth distribution from here to recalculate (using years 2001-2012 inclusive), and the probabilities for the *original* problem (likelihood of a "shared birthday group" for 50 people) are
{1: 2.9619%, 2: 84.3355%, 3: 12.2698%, 4: 0.4205%, 5: 0.0117%}
For the question you're asking, Lemkin, if we don't want to assume equal distribution, we would need to know when your birthday is! If you were born on September 14th, for example, the probability of a random stranger sharing your DOB wouldn't be 1/365, it would be 0.304%, or about 1.11/365. If it was Christmas, it would be about 0.56/365.
posted by supercres at 4:31 PM on March 9 [1 favorite]


Mod note: One AI-based answer deleted.
posted by travelingthyme (staff) at 4:07 AM on March 10 [6 favorites]


Just a reminder that when we calculate the probability of a certain event, we would only expect our real life outcomes to to match the calculated values when we do a huge number of experiments.

For instance, we expect to get heads 50% of the time when we flip a coin, but if we only flip a coin 10 times, it would be really rare to actually get 5 heads and 5 tails. The theoretical outcome is not a guarantee of the actual outcome.

So, you will get birthday matches as frequently as folks are citing above if you are doing thousands and thousands of "50 birthday askings" a day.

But because of your small sample rate... you might get a weird string of birthday match days, too! Another event to look forward to.

Obviously... you need to start recording your data!
posted by Sauter Vaguely at 8:55 AM on March 10 [1 favorite]


Just a reminder that when we calculate the probability of a certain event, we would only expect our real life outcomes to to match the calculated values when we do a huge number of experiments.

This exactly. Small sets of data can lead to all sorts of anomalies statistically.
posted by The_Vegetables at 10:20 AM on March 10


I think that would tend to modestly increase the frequency of getting 2 or more successes on the same day.

A toy example: for 50 people the probability of at least two hits is 0.84%. For 40 people it's 0.54%; for 60 it's 1.19%; those average to 0.87%. So if instead of seeing 50 people every day you're equally likely to see 40 or 60, then the average is still 50 but the probability of a hit is slightly higher. Basically observe that the probability of at least two hits is convex, then wave your hands and mumble the magic words "Jensen's inequality".
posted by madcaptenor at 10:40 AM on March 10 [2 favorites]


Also see the Birthday Paradox.
posted by oceano at 11:02 PM on March 10


« Older What to do in Orlando -- no theme parks, popular...   |   Which European cities have the least stressful... Newer »

You are not logged in, either login or create an account to post comments