Puzzles Weren't My Major: How Do I Re-Order An Mixed Up List?
January 22, 2009 12:54 PM Subscribe
Help me sort out this problem of ranking bracket winners... Thx in advance, hivemind.
I have a bunch of data that needs to be ranked in the form of "A won over B", "C won over F", "A won over F", "B won over D"... etc, etc.
How do I get all of this scattered info into a ranked list of "A>B>C>D>E>F, etc"..?
I also have a sneaky suspicion that some of these tidbits of info might be contradictory (or circular)... eg. "A>C", "C>B", "B>A"...
Any help? Thanks!
I have a bunch of data that needs to be ranked in the form of "A won over B", "C won over F", "A won over F", "B won over D"... etc, etc.
How do I get all of this scattered info into a ranked list of "A>B>C>D>E>F, etc"..?
I also have a sneaky suspicion that some of these tidbits of info might be contradictory (or circular)... eg. "A>C", "C>B", "B>A"...
Any help? Thanks!
Best answer: I should also mention that this problem is called "topological sorting" in the absence of cycles and the "linear ordering problem" or "LOP" in the presence of cycles.
And some references to the current state of the art for solving the LOP:
How to rank with few errors, Claire Kenyon-Mathieu and Warren Schudy, Proceedings of the thirty-ninth annual ACM symposium on Theory of computing (STOC), 2007, Pages: 95 - 103. (presents a way to quickly get a pretty good answer if you have a tournament where everybody plays everybody exactly once)
A branch-and-bound algorithm to solve the linear ordering problem for weighted tournaments, I Charon, O Hudry - Discrete Applied Mathematics, 2006. (The state of the art for finding exact solutions to the linear ordering problem in the general case. Somewhere, they have posted code for their algorithm, as well.)
Variable-neighborhood search for the linear ordering problem, Garcia et al., Computers & operations research, 2006 (The best published heuristic for this problem)
posted by goingonit at 1:22 PM on January 22, 2009 [3 favorites]
And some references to the current state of the art for solving the LOP:
How to rank with few errors, Claire Kenyon-Mathieu and Warren Schudy, Proceedings of the thirty-ninth annual ACM symposium on Theory of computing (STOC), 2007, Pages: 95 - 103. (presents a way to quickly get a pretty good answer if you have a tournament where everybody plays everybody exactly once)
A branch-and-bound algorithm to solve the linear ordering problem for weighted tournaments, I Charon, O Hudry - Discrete Applied Mathematics, 2006. (The state of the art for finding exact solutions to the linear ordering problem in the general case. Somewhere, they have posted code for their algorithm, as well.)
Variable-neighborhood search for the linear ordering problem, Garcia et al., Computers & operations research, 2006 (The best published heuristic for this problem)
posted by goingonit at 1:22 PM on January 22, 2009 [3 favorites]
I think each team's record is all that really matters. Divide the number of wins by the number of games to get a percentage. Sort them in descending order. Are ties okay? If not, break them by averaging the score of all the teams each team beat and the higher average wins the tie.
posted by soelo at 1:40 PM on January 22, 2009
posted by soelo at 1:40 PM on January 22, 2009
Best answer: I didn't see the Excel tag until just now.
I assume you have a single column full of A>B type data. Turn this into two columns, a winner and a loser, then use =COUNTIF to get the number of wins and losses for each team. Then you can move forward with the percentages and sorting.
posted by soelo at 1:43 PM on January 22, 2009
I assume you have a single column full of A>B type data. Turn this into two columns, a winner and a loser, then use =COUNTIF to get the number of wins and losses for each team. Then you can move forward with the percentages and sorting.
posted by soelo at 1:43 PM on January 22, 2009
Best answer: I think you just described Condorcet Voting, believe it or not. Two sentence summary: Every vote (ranking) goes into a grid and then all vote grids are summed into one big answer grid. There are special rules for handling cycles.
posted by chairface at 3:05 PM on January 22, 2009
posted by chairface at 3:05 PM on January 22, 2009
Best answer: soelo: your method is still susceptible to ties, even with the tie-break rule. But for round-robin tournaments it will work most of the time anyway.
chairface: This is indeed equivalent to Condorcet voting, except that Condorcet only cares about the winner. Having to sort everyone adds some complexity; in particular, it leads to a much higher probability of cycles. And Condorcet didn't specify a single method for handling cycles; the Kemeny-Young method both resolves cycles and ranks everyone, but it's computationally hard. This is the method I was assuming: it produces the ranking with the "fewest upsets": that is, the best Kemeny-Young ranking has a minimal number of games where a lower-ranked team beat a higher-ranked team.
posted by goingonit at 4:34 PM on January 22, 2009 [2 favorites]
chairface: This is indeed equivalent to Condorcet voting, except that Condorcet only cares about the winner. Having to sort everyone adds some complexity; in particular, it leads to a much higher probability of cycles. And Condorcet didn't specify a single method for handling cycles; the Kemeny-Young method both resolves cycles and ranks everyone, but it's computationally hard. This is the method I was assuming: it produces the ranking with the "fewest upsets": that is, the best Kemeny-Young ranking has a minimal number of games where a lower-ranked team beat a higher-ranked team.
posted by goingonit at 4:34 PM on January 22, 2009 [2 favorites]
Response by poster: awesome answers! I'll probably have to do this more than once, so it's best for me to do my homework and figure out the optimal way I can do this for myself... But THANKS A LOT for the pointers! Hopefully, my datasets aren't too complicated... but I might still email you, goinonit.. :) I had a sneaky feeling that this problem could get really hairy (and it could be a hiring question for Google engineers or something).
posted by mhh5 at 4:39 PM on January 22, 2009
posted by mhh5 at 4:39 PM on January 22, 2009
This thread is closed to new comments.
If you do have "cycles" in your data (A>C, C>B, B>A, etc), the problem is a very hard one that computer scientists and operations researchers struggle with to this day. On the other hand, we've got it solved as long as your tournament has fewer than, say, 100 participants, and mostly solved as long as your tournament has fewer than 300-400.
There are some other practical problems to contend with. If you're missing enough data, the ranking could be "unstable": there could be two equally good rankings that give people drastically different positions. But if you have a round-robin tournament (everybody has played everybody), this should not be an issue. Also, this problem is much more tractable for round-robin tournaments if I recall.
If you only have to solve this problem once, email me (my email is in my profile) and I'd be glad to help out.
posted by goingonit at 1:04 PM on January 22, 2009