Permutations: an extension of the odd sock problem
September 1, 2009 3:00 PM   Subscribe

I've annoyed myself by getting stuck attempting to work out a silly maths problem relating to permutations. It's an extension of the 'how many odd socks do I have to pull out of a drawer before I find a pair' problem. Only my theoretical sock-wearer has three legs. And a penchant for odd socks.

Let's say I'm Jake the Peg. I have s socks in a drawer. There are n different types of sock. On average, how many socks must I remove from the drawer before I get three odd ones?

I've worked this out from scratch using a drawer containing only 5 socks. Obviously, if all 5 socks are different (s=n=5), then there are 5! different ways of pulling the 5 socks from the drawer, and every single one of those leads to me having an odd triplet after only 3 socks.

If I only have 4 types of sock (i.e. in the drawer of 5, there's 1 pair and 3 odd ones), again, there are 120 ways of pulling all 5 socks out of the drawer, but this time there are only 60 unique sequences [I managed to work out from first principles that the number of unique ways of withdrawing the socks is s!/(s-n+1)!, not that that seems to help me at all ]. 84 of these sequences lead me to have an odd triplet after the first three socks. 36 of these lead me to have an odd triplet after the first four socks. At no stage do I need to withdraw all 5. So now the average number of socks I need to withdraw to get an odd triplet is (3x84 + 4*36)/120 = 3.3.

If I now only have 3 types of sock (i.e. in the drawer of 5, there are three black ones, a 'Worlds Best Dad' and a Homer Simpson), there are 20 unique ways of removing them all, and, if I've counted right, 37 ways of getting an odd triplet after 3 socks; 35 ways of getting an odd triplet after 4 socks; 48 ways of only getting an odd triplet after removing all 5 socks from the drawer. This leads to an average of 4.09.

My tiny brain is now too overheated to take this further. There must be a connection between these numbers, but I can't get a handle on a pattern, and am too daunted by the prospect of writing out all the variations of 720 sequences for a drawer of 6 socks to do it by brute force. Can anyone please put me out of my misery and point me towards a general formula for this, to solve the average number of socks necessary to achieve an odd triplet given s socks and n different types?
posted by Beautiful Screaming Lady to Science & Nature (7 answers total) 3 users marked this as a favorite
 
s and n are insufficient to describe the problem. The result when the distribution is [A A A B B B C C C] (s = 9, n = 3) is different than when the distribution is [A A A A A A A B C] (s = 9, n= 3).
posted by 0xFCAF at 3:13 PM on September 1, 2009


Response by poster: Hmmm, yes, you're absolutely right. Oh well - guess I can stop worrying about it now. Thanks!
posted by Beautiful Screaming Lady at 3:24 PM on September 1, 2009


This can be done, but it'll be a really ugly recurrence relation. You can define a function T(t, r, d), where:

t is the number of tries so far (number of socks drawn)
r is a vector containing the number of socks remaining in the drawer of each kind
d is a vector containing the number of socks already drawn of each kind
...and the value of T is the expected number of tries you still need to do before you "win"

Your example with 5 distinct socks would be expressed:
T(0, {1, 1, 1, 1, 1}, {0, 0, 0, 0, 0}) = 3

And the recurrence comes in like this:

if d has at least three nonzero entries, then T(t, r, d) = 0 (you've already won)

otherwise, you have to draw at least one more sock, so the expected number of tries until you win is 1 plus the average number of tries you'll still need afterwards (averaged over all the different cases for the different socks you might draw next):
let prob(r', d') be the probability, given your current r and d, of ending up with r' and d' when you draw one more sock; then T(t, r, d) = 1 + sum (over all possible r', d') of (prob(r', d') * T(t+1, r', d'))

Again with the 5-distinct-socks example:
T(0, {1, 1, 1, 1, 1}, {0, 0, 0, 0, 0}) =
0.2 * T(1, {0, 1, 1, 1, 1}, {1, 0, 0, 0, 0}) +
0.2 * T(1, {1, 0, 1, 1, 1}, {0, 1, 0, 0, 0}) +
0.2 * T(1, {1, 1, 0, 1, 1}, {0, 0, 1, 0, 0}) +
0.2 * T(1, {1, 1, 1, 0, 1}, {0, 0, 0, 1, 0}) +
0.2 * T(1, {1, 1, 1, 1, 0}, {0, 0, 0, 0, 1})

So you can see that all of the T(0, ...) values depend on the T(1, ...) values, and the T(1, ...) upon the T(2, ...), etc. As long as there are at least three kinds of socks, the chains of dependencies will all terminate eventually and everything will add up properly.
posted by equalpants at 3:55 PM on September 1, 2009


The minimum number of socks to pull to guarantee three odd ones is the sum of the two most numerous socks plus one. If there are, say, five colors, in the quantities 9, 8, 7, 6, 5, then you'll need to pull 17 socks to guarantee three different ones because, in the extreme case, the first 16 could all be of two colors.

But this doesn't answer your question, because you want the average number of socks. So let's say rephrase your premise as:

s = s1 + s2 + . . . + sn

The first sock you pul is one of sn. The chance of getting a different one on the next pull is (s - sx)/(s - 1), and the next pull is (s - sx - sy)/(s - 2). Thus, the probability of pulling three distinct socks in a row is:

(s2 - 2sxs - sys + sx2 + sxsy)/(s2 - 3s + 2)

If you get two of type sx in the first three tries, you need a fourth pull, the probability of which is (s - sx - sy)/(s-3). Fifth pull, (s - sx - sy)/(s-4), and to abstract that, it's (s - sx - sy)/(s-(p-1)) where p is the count of the pull. Hence, the first pull is s/(s-(1-1) = s/s = 1, which is right: we don't care what it is, we just get a sock.

What we want is the average number of pulls, which is where the probability of pulling three distinct socks is .5. Half the time it will be fewer pulls, half the time more pulls.* The minumum is three pulls, with a probability of (s2 - 2sxs - sys + sx2 + sxsy)/(s2 - 3s + 2). The probability of getting it on the fourth pull is (s2 - 2sxs - sys + sx2 + sxsy)/(s2 - 5s + 6). And so forth, where the numerator remains (s2 - 2sxs - sys + sx2 + sxsy) and the denominator is (s2 - (2p-3)s + (p2 - 3p + 2)).

The equation we want is:

.5 = (s2 - 2sxs - sys + sx2 + sxsy)/(s2 - (2p-3)s + (p2 - 3p + 2))

Solve for p and there's your answer. I think.

* I think is correct... this is where my stats knowledge is weakest.
posted by The Michael The at 6:21 PM on September 1, 2009


Oh, where:

s is the total number of socks

n is the number of distinct sock types

sx and sy are the respective quantities of the first two distinct socks you pull

p is the number of pulls
posted by The Michael The at 6:24 PM on September 1, 2009


The Michael The: I don't believe you can multiply the probabilities of dependent events like that. (s - sx) / (s - 1) is the conditional probability of choosing a sock not of type X, given that you have already chosen one of type X. (s - sx - sy) / (s - 2) is the conditional probability of choosing a sock not of type X or Y, given that you have already chosen one of X and one of Y. The second event is dependent on the first; if you don't choose a Y on the second draw, it's meaningless.

Note that your formula gives different answers if you swap X and Y: for example, s = 4, sx = 2, sy = 1 gives (16 - 16 - 4 + 4 + 2) / (16 - 12 + 2) = 1/3, while swapping X and Y gives (16 - 8 - 8 + 1 + 2) / (16 - 12 + 2) = 1/2. The overall probability of pulling three distinct socks in a row should not depend on which sock gets pulled first.
posted by equalpants at 7:09 PM on September 1, 2009


You're right. My math was very... impressionistic last night.

So maybe the formula we want is:

.5 = Σ(s - sx - sy)/(s - (p - 2)) for p = 3 to z

s is the total number of socks

sx and sy are the respective quantities of the first two distinct socks you pull

p is the number of pulls

z is the value of p at which the formula sums to .5
posted by The Michael The at 5:53 AM on September 2, 2009


« Older Are we related?   |   "Oh look, the sun went out." Newer »
This thread is closed to new comments.