Need help with a probability question
June 25, 2011 8:18 PM   Subscribe

Can someone help with a probability question?

At least I think that's what it is. I'm not a math person, so I may not even be asking the right question:

16 contestants
500,000 votes cast
Out of the 8 semi-finalists, two contestants received the exact same number of votes (so they went with 9 instead of the usual 8).

What is the probability that there would be a tie?
posted by caroljean63 to Grab Bag (22 answers total) 4 users marked this as a favorite
Do you mean the probability that any of the two contestants would receive the same number of votes? Or do you mean the probability that specifically the 8th and 9th placed contestants would get the same number of votes?
posted by Proginoskes at 8:28 PM on June 25, 2011

Response by poster: The probability that any two would receive the same number of votes.
posted by caroljean63 at 8:30 PM on June 25, 2011

Just another clarification question: what if there was a three way tie? Would that be something you'd want to include? What if there was more than 1 tie - in other words, if the 2nd and 3rd place had the same number of votes and also the 9th and 10th place had the same number of votes. Would that be something you'd want to include as well? Or, do you want to look at the probability of one and only one tie occurring between only two people.
posted by Proginoskes at 8:33 PM on June 25, 2011

Response by poster: This was an actual situation that happened at a real competition. After the online & phone voting, they were supposed to go from 16 to 8 contestants. They said that after half a million votes (actually, 548,000, I think) two of the contestants were tied, so they went with a final "9" instead of final "8". It sounded rather statistically "out there" to me, so that's the scenario I'm curious about.
posted by caroljean63 at 8:38 PM on June 25, 2011

Ok. The probability would be about 0.047% of getting a tie somewhere.

The total number of ways that people can be calculated using free distributions as 500015 choose 500000 which is 2.3342 x 10^73. The total number of ways where each contestant gets a different vote is about 23331691326245631764051552652693680057990122747590407133412607290262162499 (according to this site:

So, the probability that everyone gets a different vote is the second number divided by the first, or about 0.999520115

1 minutes that gives us the probability of at least one tie as 0.000479884819. (So this would also include cases where had three way ties or more than one tie.)

Disclaimer: It's late and I might have made an error somewhere.
posted by Proginoskes at 8:54 PM on June 25, 2011

Response by poster: Awsome! Thank you!
posted by caroljean63 at 9:12 PM on June 25, 2011

I think that the random vote probability is much lower than what's above, but that the true probability is higher since votes aren't random. Proginoskes did answer your question, but out of the the number of ties calculated only a small fraction would have produced the result that happened on the show - 9 semifinalists. This is because only ties that included vote getters #8 and #9 would count. If #8 and #9 aren't tied, you can choose the top 8 even if there are ties elsewhere. As a further restriction, #10 can't be tied with them, since then you'd have 10 semifinalists. So out of all the possible ties, the only ones that produce this are between #8, #9, and optionally vote getters higher than #8.

However, I can see two possible dynamics for making a tie more likely than a random vote distribution. First, there could be clear favorites and underdogs. If the bulk of the votes go to the top 2-3 people, the bottom 13-14 are going to be fighting for a much smaller pool of votes, raising the chance of a tie.

It's also possible that there is some factor that makes people likely to get a minimum amount of votes (for example they represent some region or demographic). In that case you'd increase the chance of ties because you're adding a constraint to the number that Proginoskes listed above - specifically that every contestant must get at least X votes.

For an extreme example of this see the US Presidential election in 2004. There were over 122,000,000 votes cast, and the difference between 8th place and 9th place was 37 votes. Both of the factors above play in here - there are two clear favorites who take the bulk of the votes, and all the underdog candidates have at least some level of popular support that's going to give them a minimum number of votes. If you did straight probability on the chance of choosing 10 numbers out of 122 Million and having two of them match, it would be vanishingly small, yet I doubt anyone would have treated it as more than a curiosity if there had actually been a tie.
posted by true at 9:55 PM on June 25, 2011 [8 favorites]

More generally: Proginoskes is assuming that people vote unbiasedly, that is, that all the candidates are equally popular and votes are cast basically at random. I think less-uniform voting patterns are more likely to give some ties.
posted by hattifattener at 10:22 PM on June 25, 2011 [1 favorite]

(From a data modeling perspective) the problem is, you're not sampling a known random distribution, you're sampling peoples' preferences to estimate the distribution of the actual population. So as stated, you'll get a totally meaningless number. By putting priors on the popularity of the respective candidates you could get a more meaningful estimate. You could also pose the problem as, given (already knowing) the true proportion of the voting population who are in favor of contestants 1, 2, 3, etc., what is the probability that a random sampling of the population would result in a tie. That one's sightly easier, but highly unrealistic. If I was asked this question for a practical purpose, I'd go the Bayesian route (setting reasonable priors on the true population parameters) and throw the whole model into a markov chain monte carlo process to get an estimate (not that there's not an analytic solution, but 90% of the time you want to get an answer and get on with your life, and the analytic solutions can be hard for this sort of stuff).
posted by devilsbrigade at 10:26 PM on June 25, 2011

Yeah, there's no real way to say "This is how unlikely that event that happened was." The best we can say is "This is how unlikely that even that happened was, if the following assumptions were true."

Proginoskes' number assumes that each outcome is equally probable. I *think*, but just informally, that this means he or she is effectively assuming that votes are drawn randomly from a uniform distribution -- I am as likely to vote for A as for B as for C, etc.

But we could make other assumptions if we wanted to, and these would give us very different estimates of the probability of the event. We might assume that votes are distributed along some other distribution. Less uniform distributions might be likely to lead to *some* tie, but it could still be the case that the tie that occurred was singularly unlikely under those assumptions.

In any event, the probability of getting an exact tie from numbers of votes that high is very small. But exactly how very small isn't really knowable.
posted by ROU_Xenophobe at 10:26 PM on June 25, 2011

Yeah, sorry, I should have mentioned I was treating every way of distributing the votes as equally possible. And that is probably not a practical assumption for the scenario. I come from a pure math background - I'm not used to dealing with real life!
posted by Proginoskes at 8:18 AM on June 26, 2011

To make the calculation a little more intuitive, think of it in terms of the pigeonhole principle.

In this case, 500K votes distributed among 16 individuals is equivalent to putting 16 objects into 500K boxes.

There are 500K ways to place the first object, 500K ways the second, and etc., therefore 500K^16 total ways.

If you want to avoid ties, you can't put any object into a box which is already occupied.

So, 500,000 ways to place the first object, 499,999 ways the second, and etc. to the 16th object, therefore 500,000!/499,984! total ways.

To get the probability of an event, divide the total number of ways it can happen by the total number of equally probable (by assumption) ways anything can happen: 500K^16/(500,000!/499,984!).

That's the probability of no ties. Because something or other has to happen, the probability of at least one tie is 1 minus this number, or [1-(500K^16/(500,000!/499,984!))].

I didn't do the calculation, but Proginoskes' answer sounds about right to me.

I'm sure other answers are right that voting biases invalidate the assumption that all outcomes are equally likely, but without making the argument, I'd be more suspicious of a tie the higher the number of votes that the tied candidates got.
posted by jamjam at 11:17 AM on June 26, 2011

Response by poster: Thanks for all the input. Personally, I think there was some funny business going on. There was one competitor who was a clear crowd favorite. At the final competition, only audience members could vote and this person brought a busload of fans. They didn't announce the final 16 until the audience arrived, so it would have been a huge disappointment for the crowd if he didn't make it.

The online voting made it super-easy for someone with a lot of time on their hands to get their favorite candidate in. Just hit submit over and over again with no verification or refresh required. There were several competitors who clearly shouldn't have made it into the finals.

In the end, the person with the busload of fans (and 9th person announced, who may or may not have been the person who tied) won.
posted by caroljean63 at 12:07 PM on June 26, 2011

I knew I'd do something like that.

It should read:

....ways anything can happen: (500,000!/499,984!)/500K^16.


...the probability of at least one tie is 1 minus this number, or [1-((500,000!/499,984!)/500k^16)].

posted by jamjam at 12:18 PM on June 26, 2011

In this case, 500K votes distributed among 16 individuals is equivalent to putting 16 objects into 500K boxes.

Huh? I can't see what putting 16 objects into 500K boxes would represent. Total votes for each object? Well, no because there is no way to ensure that the total votes equals 500K. What, then, is this representing?

I think a correct application of the Pigeonhole Principle here would be placing 500K objects into 16 boxes (each vote goes to a candidate).
posted by milestogo at 1:15 PM on June 26, 2011

Scroll down in the article I linked above, milestogo:

A probabilistic generalization of the pigeonhole principle states that if n pigeons are randomly put into m pigeonholes with uniform probability 1/m, then at least one pigeonhole will hold more than one pigeon with probability

1 - (m!/(m-n)!)/m^n

... For n = 0 and for n = 1 (and m > 0), that probability is zero; in other words, if there is just one pigeon, there cannot be a conflict. For n > m (more pigeons than pigeonholes) it is one, in which case it coincides with the ordinary pigeonhole principle. But even if the number of pigeons does not exceed the number of pigeonholes (n ≤ m), due to the random nature of the assignment of pigeons to pigeonholes there is often a substantial chance that clashes will occur.

The 500K boxes represent the entire range of possible votes each candidate could have gotten. When a candidate gets a certain number of votes, they are put into the box labeled with that number. When two or more candidates are tied, then, they would be put into the same box.
posted by jamjam at 2:02 PM on June 26, 2011

Like I said above, that model doesn't really work in this problem. To see why, imagine that the first candidate is put in the "400,000" box. Now the second candidate must be in a box that is less than 100,000, since the total number of votes is set.

I'm actually not sure the correct answer has been identified yet, though I could be wrong. I'm working on it myself.
posted by milestogo at 2:06 PM on June 26, 2011

Although I suppose it should be 500K+1 boxes to account for the possibility of zero votes.

On preview, goodness, you're right! But with such a large number of votes it might be a fair approximation in most circumstances, but I'm not sure. I'll have to think about it more, myself.
posted by jamjam at 2:15 PM on June 26, 2011

Proginoskes, I get a different number than you do. I don't remember much about free distributions, but can you explain why are you taking 500,015 choose 500,000? It would seem to me that you should just count the partitions of 500,000 that have 16 or fewer terms in them, and that is the total number of possible election results, as long as we don't care who gets which vote-total.

Here is what I did. If there is a misstep below, please let me know:

Assumption 1: each final distribution of votes is equiprobable, i.e., each vote is assigned to a candidate randomly with uniform probability.

Using the calculation tool that you provided, I took the number of partitions of 500,000 with 15 or 16 distinct terms. These are the election results where there are no ties (in the partitions that have 15 distinct terms, we consider one candidate to have received 0 votes). This total number is 1.113661747943288 x 10^60.

Then I took the number of partitions of 500,000 with 16 or fewer terms, distinct or not. These are all the possible election results. The number is 1.117678091101816 x 10^60.

Dividing the first number by the second, I get 0.996406529580831, which puts the likelihood that we end up with an election result that is NOT in the "no ties" set of results at 0.003593470419169. Note that this is a probability of 0.36%.

This actually seems high to me, but (a) I'm not sure what is wrong with this methodology and (b) I don't really have any trust of intuition here.
posted by milestogo at 2:24 PM on June 26, 2011 [1 favorite]

Here's how I got the numbers. Firstly, each of the votes is identical. It doesn't matter whether I vote for the first person or you vote for the first person, all that matters in the end is the total votes they recieve. So, we can think of lining up 500000 identical objects in a row and inserting 15 dividers in between them. This will count the number of ways of distributing the votes and can be calculated as 500015 choose 15 (or alternatively 500015 choose 500000 - they're the same number)

As for the second number, I used "compositions with distinct terms of: 500000" and "exact number of terms: 16". We need to use compositions not partitions because "2 votes for the first contestant and 1 for the second contestant" should be considered different than "1 vote for the first contestant and 2 for the second contestant"
posted by Proginoskes at 3:11 PM on June 26, 2011

Hmm. I thought you might be doing that with the 500,015. I love that technique.

However, this method allows a candidate to receive 0 votes. The "composition" technique doesn't appear to allow this. I think you may want to add "compositions with distinct terms of: 500000" and "exact number of terms: 16" to "compositions with distinct terms of: 500000" and "exact number of terms: 15", along the same lines as in my calculation. Although, because you are using compositions, you are going to need a way to allow any of the candidates to be the 0-vote getter (right?), and I have no idea how this might be done. Ideas?

Also: I agree that if you use the "divider" method to generate all the different possible vote outcomes, one should use compositions. However, can we not use partitions for both numbers and ignore the particulars of which candidate received which vote total?
posted by milestogo at 4:33 PM on June 26, 2011

You're right, milestogo. I forgot all about that! I think it would work if we did:

(Number of cases where "exact number of terms" is 16) + 16 * (Number of cases where "exact number of terms" is 15")

Multiplying by 16 because there are 16 different contestants that could be the one to receive zero votes.

I have a vague feeling that it wouldn't work so smoothly if we used partitions and ignored about which candidates received which vote total, but I'm not 100% sure why.
posted by Proginoskes at 4:59 PM on June 26, 2011

« Older Comfest hula hoops   |   Can you be too busy to date? Newer »
This thread is closed to new comments.