Fruit salad for statisticians
January 4, 2015 7:45 PM   Subscribe

What formula do I need to determine the probability that a set of size N contains two elements, each appearing with a specific frequency?

Fruit boxes, let me show you them. Let's assume I have 5 apples, 10 bananas, 8 pears, which I will randomly throw into 4 boxes of various sizes to make Christmas presents.

Box 1 = can fit 3 pieces of fruit
Box 2 = can fit 5 pieces of fruit
Box 3 = can fit 10 pieces of fruit
Box 4 = can fit 5 pieces of fruit

So 23 pieces of fruit in total and 23 slots in boxes. Now, for each Box, what is the probability that it will contain at least one apple and one pear, given their overall frequency?

So far I can tell this is going to involve combinations, which I know how to calculate, but it's the added frequency distribution of the fruit types that I am struggling with, and the fact that I don't need to know the number of possible combinations of size r (=box size), but how many of these combinations contain A(pple) and P(ear).

Y'all can have the biggest box if you help me solve this, or at least point me in the right direction!
posted by Ender's Friend to Science & Nature (16 answers total) 5 users marked this as a favorite
If you can program, the easiest thing to do is just simulate it, I think.
posted by empath at 8:03 PM on January 4, 2015

also, I think you'll find that the size of the box is fairly irrelevant. Assuming you're choosing randomly from the pool of fruit and the boxes both, but that's just intuition, not math.
posted by empath at 8:07 PM on January 4, 2015

Here's the answer for one box with N slots, and considering apples. Probability that a given fruit is an apple is p = 5/23 from the numbers you gave.

Probability of no apples is (1-p) to the power N [which I'll write (1-p)^N], so the probability of at least 1 apple is
1 - (1-p)^N. For N = 1, 2, 3, 4, the probability of at least 1 apple is 5/23 ~ 0.217, 205/529 ~ 0.388, 6335/12167 ~ 0.521, and 6335/12167 ~ 0.625.
posted by lukemeister at 8:08 PM on January 4, 2015 [1 favorite]

I'd recommend looking up card combination functions, a deck of playing cards has 52 objects of either 4 or 13 objects, and there are tons of math lessons for games like blackjack or hold 'em online.
posted by DigDoug at 8:10 PM on January 4, 2015

What I said can't be exactly right, because I essentially assumed one is picking from a vast number of fruits, 5/23 of which are apples. If the number of slots is 19 or more, there must be at least 1 apple (i.e., probability of that happening is 1). The formula I gave predicts a probability of at least 1 apple is approximately 0.991.
posted by lukemeister at 8:27 PM on January 4, 2015

There's two things you need to count. The first is how many ways can you put the fruit in the boxes in the first place. The second is how many of those ways have an Apple and a Pear in each box. The probability is the second count over the first count.

Unfortunately, off the top of my head, I don't know how to count the first thing. However, if you know how to count the first thing, you know how to count the second thing. Just put an apple and pear in each box to begin with.

After putting an apple and pear in each box, the new thing you need to count is how many ways can you put 1 apple, 10 bananas, and 4 pears in boxes that can hold 1, 3, 8, and 3 pieces of fruit. This is the same counting problem as finding the number of ways you can put all the fruit in the boxes just with different numbers.

I'll post back if it occurs to me how to count this.
posted by AaRdVarK at 8:37 PM on January 4, 2015

Here's my attempt at your example:

First off: how many different ways are there to put fruit in the boxes. If you think about it, this is essentially the same question as how many ways are there to arrange a line of 23 fruit, there are 23 possibilities for the 1st slot, 22 for the second, etc. So there are 23! possible combinations.

(I'm not sure how helpful this diagram is, but it was fun to type out so I'm leaving it in.

So now we need to know how many combinations have at least one pear and one apple in a box.

For starters let's consider the box of size 3. We can choose from 8 pears and 5 apples, so there are 40 total combinations of apples and pears, additionally we must pick any one of the remaining 21 fruit. So there are 8*5*21 possible fruit combinations. (Note that this includes instances when there are 2 pears or apples.)

We must also count the different arrangements of this fruit, so in total there are 8*5*21*3! instances in which the box of size 3 have an apple and a pear. (Suppose there is an apple in slot 1, a pear in slot 2, and a banana in slot 3. If we switch the apple and the banana we have a different arrangement; this step takes that into acccount.)Similarly the boxes of size 5 have 8*5*21*20*19*5! ways to have an apple and a pear.

The problem is that this counting method double (also triple and quadruple) counts the instances when multiple boxes have an apple and a pear and for that you need inclusion exclusion.

This is as far as I got. Sorry for the poor explanation. (Also, if my experience of probability is any indicator, this could all be wrong). But thanks for this problem, it was fun to think about and I hope this is helpful.
posted by pomomo at 8:57 PM on January 4, 2015

So my attempt is actually pretty misguided. Maybe you should try stackexchange?
posted by pomomo at 9:43 PM on January 4, 2015

If no one figures this out by the morning, I'll take a shot at simulating it at work, seems like a pretty easy thing to do. Just create an array with the fruit in it, randomize the order, then check for pairs in certain ranges of the array.
posted by empath at 10:15 PM on January 4, 2015

There are 23! ways of arranging the fruit among the slots.

If there is an apple and pear in each box there are 15! ways of arranging the rest.
Also, the A & P can be in different slots in each box.
3 slots = 6 combos (3 for one fruit * 2 for the other)
5 slots = 20 combos
10 slots = 90 combos

(15! * 6 * 20 * 20 * 90) / 23!

(6 * 20 * 90) / (23 * 22 * 21 * 19 * 18 * 17 * 16)

So the probability of each box containing at least one A & P = 0.00001092594

I think.
posted by bashos_frog at 10:16 PM on January 4, 2015

By the way,the number of distinct combinations of the fruit is:
23! / (5! * 10! * 8!) = 1472412942

Number of distinct ways of having one A & P in each box is
6 * 20 * 20 * 90 = 216000

216000 / 1472412942 = 0.00014669797

Which is probably closer to the intent of the question.
posted by bashos_frog at 10:45 PM on January 4, 2015

I think Ender's Friend is asking a) what is the probability of Box 1 having an apple and a pear, independently of the other boxes; b) what is the probability of Box 2 having an apple and a pear, independently of the other boxes; etc. Not what is the probability of all four boxes each having an apple and a pear, which is what bashos_frog is looking at.
posted by DevilsAdvocate at 10:49 PM on January 4, 2015

If box 1 has one of each fruit,
there are 6 ways it could happen.
There are 3 ways for permutations of AAP and APP. (The odd fruit can be in one of three slots)

The distinct ways of arranging the other 20 fruits are:
20! / (4! * 9! * 7!) = 55426800
20! / (3! * 10! * 7!) = 22170720
20! / (4! * 10! * 6!) = 38798760
(The denominator changing based on the numbers of fruit left)

(6 * 55426800) + (3 * 22170720) + (3 * 38798760) = 515469240 ways of arranging the fruit that has at least one of each in the first box.

And we remember there are 23! / (5! * 10! * 8!) = 1472412942 distinct arrangements overall.
515469240 / 1472412942 = 0.35008469791

That's the odds of the first box containing one of each, I'll leave the others as an exercise for the reader.
posted by bashos_frog at 11:09 PM on January 4, 2015 [1 favorite]

a very silly Python script randomly mixing the fruit and dividing the results into boxes a million times times yields:

box one has about a 35.00% chance of containing at least one apple and one pear

boxes two and four have about a 66.43% chance

box three has about an 96.91% chance

boxes two and four had about .02% difference so the margin of error seems okay

which is good enough agreement that I think bashos_frog has it right
posted by Nomiconic at 11:53 PM on January 4, 2015

Best answer: Let's look at the box that holds 5 and see what is the chance that it holds an apple and a pear. (Then repeat the argument for the other boxes). There are 23C5 ways to fill this box. 23C5 = 23!/(18! * 5!)
Then there are 18C5 ways to fill it with no apple and 15C5 ways to fill it with no pear. This counts twice the fillings with neither so we have to subtract those 10C5 ways. This gives: (18C5 + 15C5 -10C5)/23C5.
posted by Obscure Reference at 5:59 AM on January 5, 2015

Using the complement of Obscure Reference's algorithm, I get:

box 1: 0.35008469792
box 2: 0.64863740378
box 3: 0.95912648396
box 4: 0.64863740378

which also agrees with the Monte Carlo sim, and my more complicated way of explaining it.
posted by bashos_frog at 7:03 AM on January 5, 2015

« Older Adobe Acrobat N-Up printing and display comments   |   Remember me? Please Pay Me Again? To Work From... Newer »
This thread is closed to new comments.