science fair judging
April 14, 2010 4:38 PM   Subscribe

I'm looking for a mathematical solution to combining many judges' scores across a science fair.

I'm trying to come up with a fair way to combine scores from different judges across a science fair. Here's the problem:
(1) There are 40 people presenting posters.
(2) There are 20 judges.
(3) Each judge only has time to talk to a small number of presenters, let's say 4-5, and can assign a numerical score of 1-10.

Of course some judges will consistently give high scores and some will be more spread out. What's the most fair way to compare scores from different judges and determine a winner? The only way I can come up with is to normalize a judge's scores against his/her average score. I think this would work fine if judges were covering more like 10 posters each, but for small numbers it doesn't seem right. This problem feels like it should have some sort of algorithm, maybe involving linear algebra, that would work better.

Bonus: What happens when judges assign scores of 1-10 in five categories instead of one?
posted by Durin's Bane to Education (12 answers total) 3 users marked this as a favorite
 
Best answer: Get used to disappointment: Arrow's Impossibility Theorem

You could probably do worse than to use the approach that the AP sports writers use to pick the "national champion" in college football each year.
posted by Chocolate Pickle at 4:50 PM on April 14, 2010 [1 favorite]


If judges are randomly assigned to posters, then even if the individual judges' scoring methods vary, everyone has the same chance of getting a nice judge or a mean judge or a high-variance judge, so it doesn't affect their EV... I don't know if you consider EV as a standard for fairness, though.

Anyhow, I suggest instead of trying to make it "fair" to let this be a potentially cruel but useful lesson to budding future scientists about how science works in the real world: if luck of the draw gives you some cranky dickhead who hates your subfield and only gives you a Good/Fair rating, it doesn't matter if the other four reviewers loved your proposal and rated it Excellent/Very Good -- you're not getting funded. [/bitterness]
posted by Jacqueline at 4:57 PM on April 14, 2010 [2 favorites]


(By the way, experience has shown that using a geometric average in these kinds of situations is better than using a mean when trying to combine the scores of multiple judges into a single number. But that doesn't solve the underlying challenge of Arrow's theorem, or the Gibbard–Satterthwaite theorem, or the Duggan–Schwartz theorem.)
posted by Chocolate Pickle at 4:58 PM on April 14, 2010


I think the closest thing to fair in this situation would be to come up with some sort of system of rating that is as close to objective as possible, and try to make sure all judges are on the same page as to how the rating system works.

For example, have a set of criteria (perhaps categories like "Originality" or "Research required", etc). Then have different rankings like "Outstanding", "Excellent".... "Poor" for each criteria. Have the judges all agree on what it really means to be "Outstanding", etc, for each criterion.

Better yet if the judges can look at some previous years projects and agree on ratings in different categories so as to "calibrate" the rating system.

This is all tricky to do, but I think it's closer to fair than trying to normalize each judges picks if each judge is seeing that few projects.
posted by Diplodocus at 5:01 PM on April 14, 2010 [1 favorite]


You're right that a small number of ratings means that standardizing judge ratings will be problematic.

A better way of homogenizing the scores is to give judges a rubric--i.e., you put point values on certain elements of the projects (perhaps instead of five 1-10 ratings, have 5 dimensions of performance with specific point values) that judges sum up to give a total score. Using rubrics, you can increase the chances that an 8 means the same thing across judges (though it's never perfect). Without a rubric, each judge decides which aspects are important and how stingy to be with high and low scores.
posted by parkerjackson at 5:03 PM on April 14, 2010


Should've previewed-- ditto with Diplodocus (though maybe my spin on it is not as labor-intensive).
posted by parkerjackson at 5:04 PM on April 14, 2010


One more post: Here's a good example of a science fair rubric.
posted by parkerjackson at 5:06 PM on April 14, 2010


Yeah, I think the Rubric parkerjackson mentioned is pretty much what I was trying to get at, although I didn't remember that was what it was called.
posted by Diplodocus at 5:08 PM on April 14, 2010


I like what Diplodocus said.

Moreover, you have to think about this like Olympic judging -- where there are always too many athletes for all the judges to see in one sitting -- and set up firm rules to how judges are assigned to presentations.

* Give specific assignments, randomly chosen, but ensure that judges are not judging presenters with which they have a pre-existing affiliation (i.e. can't judge your own student).
* Each judge is assigned the same number of presentations to score.
* Assign each judge with a specific time to spend evaluating each presentation, with "travel time" in between (e.g. from 1:00 to 1:15, you meet with student X. From 1:20 to 1:35, you meet with student Y).

Also, try tossing out the high and low scores from each student's final calculations -- another trick cribbed from Olympic judging in gymnastics, diving, etc.
posted by Cool Papa Bell at 5:14 PM on April 14, 2010


Instead of ranking each project and computing an average, why not allow each judge to create an ordered list of their top five projects. This makes each judge pick a best project and forces them to order 5 different projects.

To combine scores, assign each #1 - 5 points, #2 - 3 points, ... #5 - 1 point and sum across all judges. The best projects will have the highest point values.

For a typical science fair, I would expect this to create a nice distribution of point values for each poster. This assumes some really good projects, some really poor projects and a vast majority of average projects.

It also gets you away from the judges which say everything is good (or bad).

One downside is the scores cannot easily be communicated without causing some hurt egos. It's possible some projects may not appear in the top 5 for any judges. So if you need to report individual scores this may not be the best. But it will tend to partition the best...

I've seen this work well in similar situations, but someone more statistically minded keep me honest.
posted by NoDef at 5:38 PM on April 14, 2010 [1 favorite]


I have been involved in many science fairs and the judging is very rarely consistent. No matter what rubric you use and no matter what mathematics you use and no matter how much education you give the judges you will always get a few retarded judges that mess everything up. You should consider the following:

1) Grade all of the projects yourself and factor that in to all of the scores. I know you can't see all the presentations, but have them turn in a lab report ahead of time. That usually compensates for any rogue judges.

2) When you see the results at the end and they are not reflective of the quality of the projects, change the judges' scores.

I know those 2 things are not ideal, but I always hated it when the kid who had a crappy project sweet-talked his/her way into a blue ribbon.
posted by Wayman Tisdale at 5:49 PM on April 14, 2010


If you score from one to ten, can't you correct for standard deviation? It won't help balance out the subjective judging error, where bad projects are marked as good due to inconsistent criteria, but it will correct for "grade inflation" from judges that consistently go higher or lower than the norm.
posted by klangklangston at 11:02 PM on April 14, 2010


« Older Guy apologize for behaving like a jerk and cries...   |   please help me replace this beloved bumper sticker... Newer »
This thread is closed to new comments.