September 16, 2012 8:28 PM Subscribe

I'm working through an explanation/derivation of the secretary problem that I've never seen before. I know the eventual answer, and I understand most of the steps, but explain this to me like I'm an idiot:

(This is for a class, but not a for-credit homework problem. I was intrigued enough to ask because it's a different way of working through a problem I already know pretty well.)

I'm giving a short setup (see this for the full formulation). n candidates, A1 through An. A1 is the best, An is the worst. They're interviewed in a random order. The first group of candidates (the ones that won't be selected) is G1; G1 has r randomly-selected candidates in it. M is the rank of the best candidate in G1.

Here's the step that I don't understand, and hasn't been in explanations of the problem I've seen before:**get an expression for the probability that M is equal to k (that is, that the overall kth best candidate is the best one in G1), in terms of k, n, and r.** I know that the probability that any given candidate is in G1 is r/n, but the probability that Ak is the best in G1 is giving me fits.

I know, based on the fact that I know the solution to the problem, that it has a summation in it. But thinking through it, I can only come up with it in terms of binomial coefficients. I'm sure I'm overlooking something dumb. Help!
posted by supercres to Grab Bag (8 answers total)

(This is for a class, but not a for-credit homework problem. I was intrigued enough to ask because it's a different way of working through a problem I already know pretty well.)

I'm giving a short setup (see this for the full formulation). n candidates, A1 through An. A1 is the best, An is the worst. They're interviewed in a random order. The first group of candidates (the ones that won't be selected) is G1; G1 has r randomly-selected candidates in it. M is the rank of the best candidate in G1.

Here's the step that I don't understand, and hasn't been in explanations of the problem I've seen before:

I know, based on the fact that I know the solution to the problem, that it has a summation in it. But thinking through it, I can only come up with it in terms of binomial coefficients. I'm sure I'm overlooking something dumb. Help!

Could you explain how you came by that expression? I can't quite follow it, and it doesn't look like anything I've come up with.

Sorry, I don't know the answer for P(m=k). I know the answer for P(choosing A1) (given in the Wiki article) and I know that the answer for P(choosing A1|m=k) has to be 1/(k-1) (the second part of selection is random between candidates better than Ak). So somehow P(choosing A1|m=k) times P(m=k) has to be the normal answer. I just can't interpret that in any normal sense.

posted by supercres at 9:15 PM on September 16, 2012

Sorry, I don't know the answer for P(m=k). I know the answer for P(choosing A1) (given in the Wiki article) and I know that the answer for P(choosing A1|m=k) has to be 1/(k-1) (the second part of selection is random between candidates better than Ak). So somehow P(choosing A1|m=k) times P(m=k) has to be the normal answer. I just can't interpret that in any normal sense.

posted by supercres at 9:15 PM on September 16, 2012

Sounds like what you want is the probability that k is in G1 and that all j (where j<k>

So, for any k, I'd think it is r/n * (1-r/n)^(k-1).

To try and verify that. For k=1, it's just the odds of the first-ranked person being in the group, or r/n (the second half is 1, because anything ^0 is 1). For k=2, it's the odds of the first-ranked person not being in the group (1-r/n) times the odds of the second-ranked person being in it, r/n. So it looks pretty reasonable as an answer to me.

Not quite sure where dragonfruit's number comes from either. It looks vaguely like the binomial distribution, but in this case since each individual is distinct I think you actually don't need that.</k>

posted by Lady Li at 11:13 PM on September 16, 2012

So, for any k, I'd think it is r/n * (1-r/n)^(k-1).

To try and verify that. For k=1, it's just the odds of the first-ranked person being in the group, or r/n (the second half is 1, because anything ^0 is 1). For k=2, it's the odds of the first-ranked person not being in the group (1-r/n) times the odds of the second-ranked person being in it, r/n. So it looks pretty reasonable as an answer to me.

Not quite sure where dragonfruit's number comes from either. It looks vaguely like the binomial distribution, but in this case since each individual is distinct I think you actually don't need that.</k>

posted by Lady Li at 11:13 PM on September 16, 2012

You're actually on the right track with binomial coefficients.

The number of groups G with r non-selected candidates that can be formed from a total pool of n candidates is n!/(r!*(n-r)!). Call that equation Z.

The number of groups g in G where the candidate with overall rank k is the highest ranked in the group is given by (n-k)!/ ((r-1)! * (n-k-(r-1))!). Call that equation Y.

The expression you're looking for is Y/Z. Note that in some instances (n-k-(r-1)) is a negative number, in which case the factorial is undefined, and in those cases the probability that Ak is the best in G1 is 0.

Hope that helps. . . .

posted by muhonnin at 12:29 AM on September 17, 2012

The number of groups G with r non-selected candidates that can be formed from a total pool of n candidates is n!/(r!*(n-r)!). Call that equation Z.

The number of groups g in G where the candidate with overall rank k is the highest ranked in the group is given by (n-k)!/ ((r-1)! * (n-k-(r-1))!). Call that equation Y.

The expression you're looking for is Y/Z. Note that in some instances (n-k-(r-1)) is a negative number, in which case the factorial is undefined, and in those cases the probability that Ak is the best in G1 is 0.

Hope that helps. . . .

posted by muhonnin at 12:29 AM on September 17, 2012

Lady Li, they're not independent probabilities, so they can't just be multiplied. It doesn't work in the limit as k increases or r decreases (but that was my first thought too).

muhonnin, that is very similar to the answer in terms of binomial coefficients that I came up with. I think I had (n-k choose r-1)/(n-1 choose r-1). But like I said, an answer in that form doesn't mesh with the rest of the problem, as least not that I can see.

Thanks, though. Hopefully this just means I'm not overlooking something simple.

posted by supercres at 4:10 AM on September 17, 2012

muhonnin, that is very similar to the answer in terms of binomial coefficients that I came up with. I think I had (n-k choose r-1)/(n-1 choose r-1). But like I said, an answer in that form doesn't mesh with the rest of the problem, as least not that I can see.

Thanks, though. Hopefully this just means I'm not overlooking something simple.

posted by supercres at 4:10 AM on September 17, 2012

Well, there are two separate problems here: the secretary problem, and the problem you posed of finding the probability that M=k for any k <>

In a simple case, let n= 100, r = 50, and k = 1. Then my formula indeed gives the probability that A1 is in G1 as 50%. But that doesn't mean that the probability of finding the optimal candidate is also 50%: if M = 5, for example, then it might be possible to interview and accept one of A2, A3, or A4 before meeting G1. The probability of finding the optimal candidate is with n=100 and r=50 is about 35.2%, as you can see from the formula on the page you linked.

So if you're overlooking something, it might be the need to also calculate the probability that the best candidate not in G is encountered before any other candidate who is also better than the best candidate in G.

posted by muhonnin at 7:23 AM on September 17, 2012

In a simple case, let n= 100, r = 50, and k = 1. Then my formula indeed gives the probability that A1 is in G1 as 50%. But that doesn't mean that the probability of finding the optimal candidate is also 50%: if M = 5, for example, then it might be possible to interview and accept one of A2, A3, or A4 before meeting G1. The probability of finding the optimal candidate is with n=100 and r=50 is about 35.2%, as you can see from the formula on the page you linked.

So if you're overlooking something, it might be the need to also calculate the probability that the best candidate not in G is encountered before any other candidate who is also better than the best candidate in G.

posted by muhonnin at 7:23 AM on September 17, 2012

Perhaps I misunderstand or am oversimplifying.

There are [n choose r] ways to pick r candidates. There are [(k-1) choose (r-1)] ways for k to be the best candidate (i.e. force candidate Ak to be in the set. Then randomly choose the remaining r-1 from the worse k-1 candidates).

So I think the odds that Ak is the best in the set are [(k-1) choose (r-1)] / [n choose r], right?

posted by bessel functions seem unnecessarily complicated at 10:40 AM on September 17, 2012

I went back and re-read the wikipedia page for the problem, and I'm not sure why you are calculating P[M=k] as (I don't think?) this expression will be useful in the final calculation of the probability. (I think what you are trying to do is find P[M=k], then multiply that with P[r+1, r+2, ... , etc. is worse than M |M=k], but that would require you to account for all r people chosen first, and that will be an incredibly complicated expression.)

And after thinking about it some more, I realized that expression I gave before was wrong.

I think bessel functions seem unnecessarily complicated is on the right track, except [k-1 choose r-1] would undercount as in this situation permutations matter. (This is the same reason why my expression before was wrong.)

posted by dragonfruit at 6:04 PM on September 17, 2012

And after thinking about it some more, I realized that expression I gave before was wrong.

I think bessel functions seem unnecessarily complicated is on the right track, except [k-1 choose r-1] would undercount as in this situation permutations matter. (This is the same reason why my expression before was wrong.)

posted by dragonfruit at 6:04 PM on September 17, 2012

This thread is closed to new comments.

P[M=k] = r*(r-2)*[(n-k)!/n!]

You said you know the correct expression? If you post it, it might help people get their thoughts going, as it's often easier to work backwards and much easier to explain why an expression is true than to come up with it.

posted by dragonfruit at 9:04 PM on September 16, 2012