Determining the ranking of a series of items?
August 13, 2017 4:35 PM   Subscribe

I have a set of items that people have voted on and given a score out of 10. For the sake of discussion, let's say they're movies. What's the best way to use those votes to determine a list of the top 10 movies?

The problem I'm running into is that Movie A may have only one vote of 9.5 and Movie B may have several votes with an average of 9.0. Since Movie A has fewer votes, I'd be less confident that it truly is better than Movie B. Is there some way to factor in the number of votes when determining a movie's ranking? I suppose that I could only consider movies with more than a certain threshold of votes, but I'm wondering if there are other options out there.
posted by Proginoskes to Grab Bag (8 answers total) 4 users marked this as a favorite
 
The easiest way to do this, which also actually has statistical justification from a Bayesian perspective, is to add some number of fairly neutral votes to every movie, and then do your averages as usual. For example, you could pretend that every single movie has received five votes of 7.0 in addition to all the real votes. Play with the number of fake votes and their value until you get results that feel good to you.
posted by dfan at 4:53 PM on August 13, 2017 [4 favorites]


One standard approach is to take a Bayesian approach, where you start with some sort of a prior belief for a rating of a movie (before any evidence, a movie is ranked as "average", say 7). Then, each vote is taken as evidence that the true rating of the movie differs from your prior; if there are only a few votes then the evidence is weak and the prior still has a lot of impact and if there are a lot of votes, the evidence is strong and the prior is effectively forgotten.

Mathematically, this corresponds to adding some number (N, doesn't have to be an integer) of pseudo-votes of X (some average rating) to every movie. Using an N of 1 and X of 7, in your case above, the movie A will have a posterior rating of (9.5+7)/2 = 8.25 and B will be (3*9 + 7)/4 = 8.5.

This method, combined with a slightly more principle way of choosing N and X is what IMDB uses for its rankings (http://www.imdb.com/help/show_leaf?votestopfaq)

Edit: on preview, looks like dfan had a similar answer.
posted by bsdfish at 4:54 PM on August 13, 2017 [5 favorites]


The approach outlined above is called adding a pseudocount.
posted by grouse at 5:15 PM on August 13, 2017


Response by poster: For the approach of adding a set of dummy votes, is there a way to pick the values of N and X besides just playing around with the values and finding what seems to work best? Or is that really the best way to do it?
posted by Proginoskes at 6:01 PM on August 13, 2017


Best answer: One approach for choosing X, the prior average rating of a movie would be to just use the average of all the ratings for all movies you've seen -- that's the approach taken by IMDB (they call that variable C). That's a violation of proper Bayesian framework but people often do it, calling it "Empirical Bayes" (and it can be theoretically justified as a crude estimation of a hierarchical bayes model).

Besides playing to get things right, one approach you can take is to define an objective function and then use a method to optimize that objective function. For example, your objective function could be to predict the value of a held-out rating which you an evaluate with leave-out-out cross validation and you could choose an N which leads to the best result.

One thing to note is that ranking isn't the same as predicting ratings -- for one, people tend to be very bad at assigning point values in a meaningful way and for another, a small change in point values can lead to a large change in rankings and vice versa. There's a large literature, learning to rank of methodologies which can be used.
posted by bsdfish at 6:42 PM on August 13, 2017 [2 favorites]


There is no "best" way. You are trying to add up people's orderings and without imposing some idea about intensity of preference ("My 8 is twice as good as my 7", "A 6 from you and from me means the same thing") there is no method to assemble a true aggregate ranking. There might be ones that you find more pleasing and reflective enough of people's orderings though.
posted by hawthorne at 6:49 PM on August 13, 2017


Board Game Geek has this issue with its 100,000-odd titles and it does exactly what dfan suggests.
posted by obiwanwasabi at 7:17 PM on August 13, 2017 [3 favorites]


Given that I need to sort things about once every 5 years, I've kept this article around since apparently 2009. The author's preferred method is "Lower bound of Wilson score confidence interval for a Bernoulli parameter" (yes, math is included), but you might be better served by previous answers that are used in production settings. I can't speak to the differences of each approach.
posted by Nonsteroidal Anti-Inflammatory Drug at 5:54 AM on August 14, 2017 [1 favorite]


« Older A question for the Liberian Cabal   |   Please help me pamper my face! Newer »
This thread is closed to new comments.