Chance of event with small sample size, based on larger related sample?
March 20, 2014 7:20 AM Subscribe
Can/how can one improve the estimate for a chance of an event with a small historical sample size by utilizing the chance of a related event with a large historical sample size? Example and half-assed guess inside.
posted by Flunkie to Science & Nature (16 answers total) 2 users marked this as a favorite
A baseball-related example (for the purposes of this question, please forget about complicating factors like lefty/righty splits, home/away splits, the fact that a particular player might be better or worse now than he was in the past, etc.):
Joe has had 1000 at bats. He has gotten a hit in 270 of those 1000 at bats.
Of those 1000 at bats, 10 were against the pitcher Fred. In those 10 at bats against Fred, Joe got six hits.
Clearly we can say "Joe is a .600 hitter against Fred". But also clearly, that doesn't really have any meaningful predictive power for Joe's future at bats against Fred. If we want to guess what the chance of Joe getting a hit off of Fred is, 27% is almost certainly a much better guess than 60%.
But can we use both pieces of information to get a guess that's better than "27%"?
I have a half-assed guess, which I'll describe momentarily, but it occurs to me that this is probably a problem which has been thought about rigorously by mathematicians. So does anyone know if there's a "real" answer to this problem?
My half-assed guess is something along these lines:
Joe has 10 at bats against Fred, and 6 hits in them. But Joe has 1000 at bats total (with 270 hits). Let's assume that if Joe had had 1000 at bats against Fred, 10 of them would have gone as they did, and the other 990 would have been as if against an average pitcher. So Joe would have gotten:
6 + 990 * 270 / 1000
= 6 + 267.3
So we guess that in his upcoming at bat against Fred, Joe has a 27.33% chance of getting a hit.