Given P outcomes, how many different P's are chosen when N choices are made?
December 11, 2005 2:06 AM
Subscribe
Basic statistics: given P possible outcomes, how many different P's are chosen on average when N choices are made?
My question assumes that each selection is made randomly and independently (there's no replacing/removing, i.e. the same option can be selected for each N), and "average" refers to the average of how many different outcomes are chosen. The order of choices is irrelevant -- foo + bar and bar + foo both count as 2 distinct choices total, and foo + foo or bar + bar are both 1 total. In the case of P=2 and N=2, for example, I would intuitively guess 1.5 on average (i.e. half the time selecting the same one both times and half the time selecting two different ones). Is my intuition correct?
Going beyond that though, if I had P=20 options and selected N=10 at random, how many different options should I expect to have actually wound up selecting, on average? (Surely less than 10 due to duplicate selections, but how much below 10?) What if I had P=100 options and I selected N=50 times at random? And so on. How would I express the answer as a formula?
Links are welcome! This is not for class or academia, so for your explanation, brownie points will be awarded for clarity over theory. Out of personal interest, I do appreciate reading and learning what's the appropriate jargon to use but only where relevant. Thanks in advance.
posted by DaShiv to education (23 comments total)
The probability that you pick an unpicked item (out of P options) in each of N rounds is found (at each round) by:
Round 1 : 1
Round 2 : (P-1)/P
Round 3 : (P-2)/P
...
Round N : (P-N)/P
The cumulative probability is the product of those round-wise probabilities. So, the probability of getting through N rounds without picking any duplicates is (P-N)!/P^(N-1).[1] Now, the average number of distinct choices is a little more tricky. Er, tricky enough that I can't make it up off the top of my head - mostly because, if you screw up one time and double-dip the same item, it sets your probabilities back a round.
But first a question or two: Are you replacing the samples once you've drawn them, or removing them. You suggest that you're doing neither, which is rather difficult to comprehend. If I draw a blue marble (from a bag of outmoded toys, naturally), do I have a chance of drawing that exact same marble again? (My guess here is yes - there is only one 'foo' in the bag, but I might be able to draw it twice in successive rounds.)
[1] (The exclamation point means factorial: 3! = 3*2*1 = 6 and 123! = 123*122*121 * ... * 2 * 1 = 1.2146*10^205, which is really rather bigger than I thought it would be..)
posted by metaculpa at 3:08 AM on December 11, 2005