Wikipedia Reliability Meter
April 11, 2009 8:59 AM   Subscribe

I'd like to put together a Wikipedia Accuracy Meter - Perhaps a browser plug-in or a web page that attempts to give an estimate of the reliability of any given article at any given moment. Please come in for discussion of algorithm and technical hoo-ha to help me assess if this is a weekend project or a master's thesis....

The kinds of things to be taken into account would be age of entry, number of changes, number of reversions, locked status. Most of these would have some sort of bell curve for influence on the rank. An article near the 0% reliability mark would be a new entry with few authors. A higher ranked article would be older with many revisions and a decreasing frequency of revision. I am not sure how to treat a locked entry - Same rank as before it was locked, minus something that takes into account the fact that someone out there strongly feel the existing status is incorrect? Contributions from "trusted authors" might help as well, though that could catapult this into a more substantial AI problem. I am kind of hoping to get a moderately useful number with little work by taking the Wikipedia methodology on its own terms, though I can see the possibility of using what I perceive as deeper weaknesses (and strengths) in the method to inform the algorithm.

Technically, I think the actual programming could be easy, though tuning should be pretty hard. I know I could stumble through a web page in php that takes a URL and figures out how to access the history of the page - Is there any back end API for that kind of stuff on the Wikipedia? My poking around made me think it's the kind of stuff they are happy to expose on the web page, though maybe not to developers.

Elsewhere on the net someone pointed me to the WikiTrust project but I am not sure if I can use that effectively without mirroring the entire Wikipedia. I do like the idea of being able to do contextual highlighting to note the more or less reliable parts of an article.

Thoughts, advice, recriminations encouraged!
posted by mzurer to Computers & Internet (17 answers total) 3 users marked this as a favorite
 
Response by poster: Oh! If someone has already done this, please point me in that direction! I'm more interested in the results than the process.
posted by mzurer at 9:01 AM on April 11, 2009


I actually did a little research in this area for a paper last year. There have been several people who have done some studies that compare machine-rated articles to articles rated by human experts, and the machine-rated accuracy was not encouraging. I am in the middle of unpacking from a move, but if I can dredge up my bibliography later I'll post it here. If you have access to an academic journal database, you could try just searching it yourself for "wikipedia".
posted by Rock Steady at 9:18 AM on April 11, 2009


You're going to need a pretty detailed description of what constitutes "reliability" before this is going to be of any use whatsoever.

This is actually a massive project, not something you can knock up in a weekend. You've got an incredible amount of data, but even most of that data is highly questionable. For example, the Conflict of Interest/Noticeboard was in fact maintained by someone who himself had a significant conflict of interest, and who edited any attempts to expose this on Wikipedia. Though that editor has apparently retired, the site continues to be plagued--in completely non-transparent ways--by issues of this sort.

The main problem is that, even on its own terms, Wikipedia is a failure. It's a highly interesting and useful failure, to be sure, but even the most "reliable" articles have problems. Take a look at their featured article criteria. Not only are many of these completely non-quantitative matters of opinion--which will make them impossible to use for your purposes--but I'd be willing to bet that you could make a compelling argument at every single feature article violates at least one of those criteria.

In short, even if a project like this were possible, and there are good reasons to believe that it isn't, it would take far longer than anything you could knock up in a weekend or six. The sheer length of time required to adequately deal with the seething mass of changing content that is Wikipedia might make it impossible independent of epistemic concerns.

I say drop it.
posted by valkyryn at 9:40 AM on April 11, 2009 [1 favorite]


i think you'd need to add number of distinct contributors, otherwise it'd be too easy to game your algorithim. post something, modify it and wait - over time it will become accurate.

that makes me wonder, do things become more accurate over time? the algorithim you propose implies that they do.

gravity was accurate from the minute it was conceptualized - i'm not sure its become more accurate (maybe the same for the wikipedia entry about gravity). i think you should study the characteristics of articles you feel are 100% accurate and make sure that your algorithim recognizes them as such.

i can't think of non-sciency things that you could test, but off the top i'm thinking things like gravity, fibonacci, pythagoream therom
posted by askmehow at 9:43 AM on April 11, 2009


I like what you're getting at, and it would be uber helpful. However, there's no conceivable way to do it. Everything on wikipedia is subjective. Trying to get a baring on how close to the Truth an article is depends on actually knowing the Truth. After all, not even the encyclopedia britannica is 100% reliable. Furthermore, many wikipedia articles (and the ones most likely to be challenged) are based on current ongoing events where the reliability hinges on ongoing information. At best, a reliability indicator would give readers a false sense of security.

The metrics you've proposed don't really conclude anything. If a page has been edited dozens of times and frequently, does that make it more reliable? Less reliable? It would depend on the type of article. Does a "trusted author" get free reign to write whatever they want on any subject? Just because they are an authority on ornithology doesn't mean they're 100% accurate on Star Wars characters. Is an older article, say, "Jesus", more reliable? What about wiki-vandalism?
posted by JuiceBoxHero at 10:00 AM on April 11, 2009


Response by poster: I case I didn't make this absolutely explicit - I don't think the Wikipedia model gets us anywhere close to Truth, I am just curious if I accept what I understand to be the working model of the Wikipedia, how reliable would any given article be. Even if I subscribe to a subjective notion of truth, I should be able to discern between information that is somewhat suspect and extremely suspect.
posted by mzurer at 10:14 AM on April 11, 2009


mzurer, I understand where you're coming from, but the real problem with Wikipedia is not that it's subjective or that it doesn't get us close to "Truth." The problem is that it is almost impossible to come up with any metrics with a sufficient degree of confidence to enable anything even as seemingly modest as your project. Transparency, one of the touted advantages of wiki-type projects over traditional editorial works, is actually a huge liability here. We don't really know anything about who Wikipedia contributors are beyond their IP addresses and the information they choose to disclose to us, which we have no way of independently verifying.

In short, where Wikipedia is concerned, the confidence of a given metric is inversely related to the interest of that metric. Coming up with a total number of articles on Wikipedia is trivial to do, but it's also pretty trivial in terms of what you can do with that number. Coming up with a list of contributors sorted by number of contributions is interesting, but much harder to get, as again, the only things we know about them are their IP addresses and self-disclosed user info. Coming up with demographics on Wikipedia users and contributors would be awesome but is essentially impossible to do with any satisfactory degree of confidence.

Besides, your concept of "reliable" seems squishy enough that even if we had interesting and reliable metrics of any sort that I doubt you could get anything useful out of them.
posted by valkyryn at 10:38 AM on April 11, 2009


I think this is a very interesting question, and if you're willing to accept this more as a "lets see what I learn about climbing when I climb this unclimbable mountain" rather than "I'm doing something that will be 100% accurate" then I say go to it. You'll undoubtedly learn a lot of useful information about Wikipedia, social information and machine smarts.

Google doesn't give exactly the right answer either. In fact it's pretty awful quite often. But it's still useful.

I think making a single "reliability" percentage might be a bit broad which is making is so hard. However perhaps we can devise either more meters, or modify our definition.

One option would be a "confidence light" that would be red/yellow/green, which seems much more doable than a "82% reliable vs 85% reliable article". Another possibility would be to have a couple of meters, one of "stability" and... um.. lets call it "edited-ness". Stability would be a metric of how much the page has changed recently (a locked page would have very low stability, that's usually why they get locked.) and "edited-ness" would indicate how many contributors added and changed how much to an article. An article with only a few contributors editing a large percentage of the text often would rate low. Lots of contributors making small changes would rate high.

Here's all stuff I can think of, as someone who knows nothing of the content of an article, that might be useful in evaluating the goodness of a Wikipedia article.
- Age from first publication.
- Number of contributors.
- Size and number of recent edits. (though something that is recently newsworthy could throw this out of whack.)
- Amount of discussion on this page. (Lots of discussion would mean a controversial or hotly debated topic. Though I'm not sure if this should ding the reliability or help it.)
- Number of citations per unit of text.
- When and how often a page got quality tags, and how long ago they removed them ("needs cleanup" "needs citations", etc.)
- How many languages this article has been translated into. (Not sure if this is useful, since it might be more of a popularity metric than anything. I'd weight it low.)
- Number of images, etc. (I'd rank a page with images and illustrations slightly higher since it's harder to contradict an image and someone made the effort to add media. But only slightly.)
- How much traffic the page gets. The more traffic the more people see it and the more likely they are to find and correct mistakes. If you don't have access to the raw number you can cobble something together. Possibly using Wikipedia's number of pages that link to that page, and a number from Google that counts the same link.

One way to see how effective your algorithm is would be to generate a sparkline of an article using different revisions to see how an article changes over time.

Very cool project! Let us know what you come up with!
posted by Ookseer at 11:51 AM on April 11, 2009


What if you picked some pages that you think are accurate, fact-check them to make sure, and then see what they all have in common. You could do the same for highly inaccurate pages. Instead of working on assumptions (like, do few people editing an article make it accurate or inaccurate), you'd have some data showing what accurate pages look like. Then you could construct your algorithm using that data, pick like 100 random wiki pages, rank order them with your algorithm, then fact-check them to see how good your algorithm works. Like, your algorithm correctly classified 80% of the pages.
posted by lockestockbarrel at 12:12 PM on April 11, 2009 [1 favorite]


Hmm. I recall seeing a similar project for news sites. I've not seen anything for wikipedia though.
posted by I_pity_the_fool at 12:29 PM on April 11, 2009


As long as you're going to go to all this trouble, you should also check the accuracy of some comparable Encyclopedia Britannica entries and see if you can confirm the disputed results of Nature magazine's 2005 study.
posted by Jaltcoh at 2:06 PM on April 11, 2009


Oh, and look through that link for several examples of previous attempts to gauge the accuracy of Wikipedia.

Of course, it's a bit of a catch-22 -- can you trust the accuracy of Wikipedia's reporting on its own accuracy?

Then again, I don't remember seeing Encyclopedia Britannica opening itself up to such criticism.
posted by Jaltcoh at 2:17 PM on April 11, 2009


I think accuracy is going to be a hard thing to quantify without actually looking at the text. I think the best you can do would be trying to quantify metrics which could lead to inaccuracy, controversy, etc. call it the Mzurer Index, or something, where having a High level on the index is indicative of risk of inaccuracy.

Especially considering if you go with your original idea and mark a page inaccurate more people are likely to try to correct errors on the page, creating negative feedback loops.

For example multiple reverts in a short time might be a possible component.
Being about scientific content without a political component might be a possible component.
Being closely tied to other articles with a high Mzurer index.

Anyway what you really need to do is some exploratory analysis along the lines of what lockstockbarrel says, though I think you'll need to oversample inaccurate pages, since I suspect inaccurate articles may be a small fraction of overall articles.

I'd suggest using something like the mechanical Turk to get a large number of currently inaccurate pages at say 3 cents a page, and having the turkers give you a time when the article was looked at, so if they edit it, you can still look at the inaccurate page. Pull down 500 or so... thats only about $15 bucks.
posted by gryftir at 4:48 PM on April 11, 2009


Another metric could be the frequency of weasel words?
posted by yoHighness at 7:10 PM on April 11, 2009


As someone who's released the first edition of several pages I can say firsthand that all changes since then have consisted of corrections of grammar, application of Wikipedia standards for hierarchical information, addition of more information or requests for citations. I understand that I am just a data point, but I fail to see how any of my articles near 0% reliability when I first posted them. From my own perspective, I would never post an article unless I thought myself to be sufficiently knowledgeable of the subject matter and have the necessary details and citations at hand.

I don't wish to sound targeted, but have doubts that your premise is a valid one.
posted by furtive at 10:48 PM on April 11, 2009


Tangentially: Instead of judging a page by edit records, what about programming a little widget that judges a page by citations? Like google pagerank, you could give books highest scores, academic papers high scores (possibly those from Nature and Science higher), articles lower scores, web sites lowest of all. The more a paper is cited within wikipedia, the higher it ranks as well.
My rationale: When I'm judging the accuracy of a wikipedia page, I judge it by the number of academic paper citations displayed. Of course you have to evaluate the sources as well, and this i do by going onto Google scholar and checking the number of times a paper was cited. Is wikipedia large enough that you would have a large enough sample size from which to take citations?
posted by wayofthedodo at 11:36 PM on April 11, 2009


Probably the best way to implement it would be to do a regression. Create a dataset of around 1000 randomly selected articles (with all metrics possible) and rate them (subjectively) on "reliability". Then, you would run through the reliable and unreliable articles and try to find correlation.

To gather data, you could have trusted individuals go through WP and if they have a judgment on an article, they make it. Over time you could build up a database of reliable articles. I'm guessing you could set up a javascript bookmark that could do this for you, or maybe a greasemonkey widget.

If someone out there knows how to write genetic algorithms, that could be an option. It would try to come up with the reliability score that matches the human score.

My guess is that one excellent metric would be "tennis match" editing, with a few select editors editing the same article back and forth. You could look for keywords in the update message. Words like "cite" or "proof" for example. A really effective meter would require gathering some information on the contributors. There are definitely some editors out there with an axe to grind.

I think it'd be better to assume 50% reliability for new articles, and then increase or decrease the score based on later activity. One positive indicator could be the editor: how reliable are the other articles that the editor has created? I think something like PageRank would work, but centered on users (who, after all, determine pretty much all of the content on Wikipedia). If an editor has deleted articles, their EditorRank would be lower. If one of their articles was merged, on the other hand, that's generally an indicator of good content and should increase their rank.
posted by Deathalicious at 1:07 PM on April 17, 2009


« Older Sudden drop in water pressure on second floor   |   The Ancient Indian Burial Ground of relationships Newer »
This thread is closed to new comments.