How can I make rankings closer reflect votes?
November 18, 2006 12:59 PM   Subscribe

In a ranking system, when you have multiple items with similar rankings, how can you give more weight to those with the most votes?

For instance, one item has a 90% rating with only 2 votes, and another has an 89% with 100 votes. Is there a formula that would give more weight, or precedence, to the one with more votes?

FWIW, I'm using the formula in a PHP script.
posted by bjork24 to Computers & Internet (14 answers total) 4 users marked this as a favorite
 
Sure there's such a formula if you make one up. I'm assuming that you want each total vote for something to be worth say 0.0001 or maybe a little more, in addition to whatever the actual percentage is. So go ahead and do that if that's how you want the ratings to behave.
posted by kindall at 1:05 PM on November 18, 2006


Showing standard deviation oughta do it.
posted by aberrant at 1:05 PM on November 18, 2006


sdev of 2 votes at .9 and 100 at .89 is the same right?
posted by Heywood Mogroot at 1:10 PM on November 18, 2006


Multiply the rating by (n/n+1). The higher the number of votes, the closer this is to 1.0.
posted by smackfu at 1:20 PM on November 18, 2006


(Or, if you prefer, you can add that factor. It depends on whether you want to really devalue things with low votes, or just punish them a little.)
posted by smackfu at 1:22 PM on November 18, 2006


I don't know if it is appropriate in this case, but I think the best way to deal with this situation is to have a minimum number of votes cast, say 10, before the rating is calculated. The information gathered by just 2 votes really has no meaning.
posted by brockerst at 1:39 PM on November 18, 2006


I played around with this in Excel and the formula I liked best was rank = ((rating out of 100)^2) * ((number of votes)^(1/3)). Excel forumula: [PRODUCT(POWER(A1,2),(POWER(B1,1/3)))]

That is, the square of the rating from 1 to 100 times the cubic root of the number of votes.

Here is how this formula ordered the following items with various ratings and numbers of votes:

1. 95 rating with 75 votes
2. 89 rating with 100 votes
3. 85 rating with 115 votes
4. 85 rating with 50 votes
5. 78 rating with 65 votes
6. 95 rating with 15 votes
7. 45 rating with 500 votes
8. 90 rating with 2 votes
posted by thirteenkiller at 1:53 PM on November 18, 2006


Would it be any good for your purpose to round the ratings to the nearest 5% (ie. 81-85% all show as 85%, 86-90% all show as 90%) and then do an additional secondary sort by the actual number of votes?
posted by selton at 1:55 PM on November 18, 2006


previously
posted by Lanark at 2:04 PM on November 18, 2006


How about just adding in a dozen votes for zero for all items, and then use the mean?
posted by aubilenon at 3:32 PM on November 18, 2006


Something like this would work (bayesian ranking formula)
posted by muddylemon at 7:22 PM on November 18, 2006


For comparison and study, IMDB's Top 250 formula is:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the Top 250 (now 1300)
C = the mean vote across the whole report (now 6.7)
posted by rokusan at 7:43 PM on November 18, 2006


Alternatively you could simply give people both figures, this is what metacritic.com do

6.3 out of 10 based on 9 votes
7.8 out of 10 based on 73 votes
3.3 out of 10 based on 50 votes

Its a lot more open and understandable than inventing some random unexplained algorithm.
posted by Lanark at 7:55 AM on November 19, 2006


The IMDB ranking rokusan cites is an example of the bayesian ranking described in muddlylemon's link. To explain what IMDB is doing in words: they are essentially adding in 1300 votes of 6.7 to every movie in addition to the actually cast votes. The "6.7" is simply the average vote over all movies. The 1300 is somewhat arbitrary--pick a number that seems to work well depending on how many votes you have overall.

How about just adding in a dozen votes for zero for all items, and then use the mean?

This potentially creates the opposite problem: items with few votes being ranked too low. If you also have a "bottom 100" list, should an item with 3 zero votes be ranked lower than one with an average rating of 1.4 based on hundreds of votes? This is way in the bayesian rating, the "extra votes" added in have the value of the average vote over all items, not zero.

BoardGameGeek uses both - a true average and a Bayesian average are available for individual games, but the games sorted by rank list uses the Bayesian average.
posted by DevilsAdvocate at 9:41 AM on November 20, 2006


« Older Book of woods   |   What should one do when failing a class? Newer »
This thread is closed to new comments.