# Determining the ranking of a series of items?

August 13, 2017 4:35 PM Subscribe

I have a set of items that people have voted on and given a score out of 10. For the sake of discussion, let's say they're movies. What's the best way to use those votes to determine a list of the top 10 movies?

The problem I'm running into is that Movie A may have only one vote of 9.5 and Movie B may have several votes with an average of 9.0. Since Movie A has fewer votes, I'd be less confident that it truly is better than Movie B. Is there some way to factor in the number of votes when determining a movie's ranking? I suppose that I could only consider movies with more than a certain threshold of votes, but I'm wondering if there are other options out there.

The problem I'm running into is that Movie A may have only one vote of 9.5 and Movie B may have several votes with an average of 9.0. Since Movie A has fewer votes, I'd be less confident that it truly is better than Movie B. Is there some way to factor in the number of votes when determining a movie's ranking? I suppose that I could only consider movies with more than a certain threshold of votes, but I'm wondering if there are other options out there.

One standard approach is to take a Bayesian approach, where you start with some sort of a prior belief for a rating of a movie (before any evidence, a movie is ranked as "average", say 7). Then, each vote is taken as evidence that the true rating of the movie differs from your prior; if there are only a few votes then the evidence is weak and the prior still has a lot of impact and if there are a lot of votes, the evidence is strong and the prior is effectively forgotten.

Mathematically, this corresponds to adding some number (N, doesn't have to be an integer) of pseudo-votes of X (some average rating) to every movie. Using an N of 1 and X of 7, in your case above, the movie A will have a posterior rating of (9.5+7)/2 = 8.25 and B will be (3*9 + 7)/4 = 8.5.

This method, combined with a slightly more principle way of choosing N and X is what IMDB uses for its rankings (http://www.imdb.com/help/show_leaf?votestopfaq)

Edit: on preview, looks like dfan had a similar answer.

posted by bsdfish at 4:54 PM on August 13, 2017 [5 favorites]

Mathematically, this corresponds to adding some number (N, doesn't have to be an integer) of pseudo-votes of X (some average rating) to every movie. Using an N of 1 and X of 7, in your case above, the movie A will have a posterior rating of (9.5+7)/2 = 8.25 and B will be (3*9 + 7)/4 = 8.5.

This method, combined with a slightly more principle way of choosing N and X is what IMDB uses for its rankings (http://www.imdb.com/help/show_leaf?votestopfaq)

Edit: on preview, looks like dfan had a similar answer.

posted by bsdfish at 4:54 PM on August 13, 2017 [5 favorites]

The approach outlined above is called adding a pseudocount.

posted by grouse at 5:15 PM on August 13, 2017

posted by grouse at 5:15 PM on August 13, 2017

For the approach of adding a set of dummy votes, is there a way to pick the values of N and X besides just playing around with the values and finding what seems to work best? Or is that really the best way to do it?

posted by Proginoskes at 6:01 PM on August 13, 2017

posted by Proginoskes at 6:01 PM on August 13, 2017

One approach for choosing X, the prior average rating of a movie would be to just use the average of all the ratings for all movies you've seen -- that's the approach taken by IMDB (they call that variable C). That's a violation of proper Bayesian framework but people often do it, calling it "Empirical Bayes" (and it can be theoretically justified as a crude estimation of a hierarchical bayes model).

Besides playing to get things right, one approach you can take is to define an objective function and then use a method to optimize that objective function. For example, your objective function could be to predict the value of a held-out rating which you an evaluate with leave-out-out cross validation and you could choose an N which leads to the best result.

One thing to note is that ranking isn't the same as predicting ratings -- for one, people tend to be very bad at assigning point values in a meaningful way and for another, a small change in point values can lead to a large change in rankings and vice versa. There's a large literature, learning to rank of methodologies which can be used.

posted by bsdfish at 6:42 PM on August 13, 2017 [2 favorites]

Besides playing to get things right, one approach you can take is to define an objective function and then use a method to optimize that objective function. For example, your objective function could be to predict the value of a held-out rating which you an evaluate with leave-out-out cross validation and you could choose an N which leads to the best result.

One thing to note is that ranking isn't the same as predicting ratings -- for one, people tend to be very bad at assigning point values in a meaningful way and for another, a small change in point values can lead to a large change in rankings and vice versa. There's a large literature, learning to rank of methodologies which can be used.

posted by bsdfish at 6:42 PM on August 13, 2017 [2 favorites]

There is no "best" way. You are trying to add up people's orderings and without imposing some idea about intensity of preference ("My 8 is twice as good as my 7", "A 6 from you and from me means the same thing") there is no method to assemble a true aggregate ranking. There might be ones that you find more pleasing and reflective enough of people's orderings though.

posted by hawthorne at 6:49 PM on August 13, 2017

posted by hawthorne at 6:49 PM on August 13, 2017

Board Game Geek has this issue with its 100,000-odd titles and it does exactly what dfan suggests.

posted by obiwanwasabi at 7:17 PM on August 13, 2017 [3 favorites]

posted by obiwanwasabi at 7:17 PM on August 13, 2017 [3 favorites]

Given that I need to sort things about once every 5 years, I've kept this article around since apparently 2009. The author's preferred method is "Lower bound of Wilson score confidence interval for a Bernoulli parameter" (yes, math is included), but you might be better served by previous answers that are used in production settings. I can't speak to the differences of each approach.

posted by Nonsteroidal Anti-Inflammatory Drug at 5:54 AM on August 14, 2017 [1 favorite]

posted by Nonsteroidal Anti-Inflammatory Drug at 5:54 AM on August 14, 2017 [1 favorite]

This thread is closed to new comments.

posted by dfan at 4:53 PM on August 13, 2017 [4 favorites]