Cohen's kappa in R?
April 12, 2011 12:04 PM

Statistics filter: Cohen's kappa and inter-rater reliability in R?

I have a set of data in which people gave a binary (yes or no) response to a series of prompts. I want to compare their consistency with one another, and the overall consistency of the group with my predicted responses to the prompts. I have a few questions about how to structure the statistical tests (usage in R specifically):

1. Is Cohen's kappa the right test to use for this sort of thing? (And is there a way to make all of the multiple comparisons simultaneously using Cohen's kappa or some other test, or do I have to compare each individual pair?)
2. What do I use as the input to cohen.kappa()? Is it just the matrix with the individual responses?
3. When I compare the group responses to my predictions, is there any way to take into account the variability in the group? The alternative of just choosing a cutoff point (e.g. more than 50% yes votes gives a yes) and comparing that with my predictions would give me a basic idea but doesn't seem like a particularly honest way of dealing with the data!

Thanks for your help!
posted by SymphonyNumberNine to Grab Bag (2 answers total) 1 user marked this as a favorite
The documentation for cohen.kappa is pretty explicit:
cohen.kappa will accept either an object by category matrix of counts in which the numbers represent how many methods have placed the object in each category, or an object by method matrix of categories in which the numbers represent each method's categorization of that object.
Also consider using a biserial correlation with polychor.
posted by Nomyte at 3:15 PM on April 12, 2011


What's most appropriate depends on what (in detail) you're trying to find out, how the experiments were designed, etc. In general, the best idea is to look in your field's journals and see what people do with similar problems.

I tend to downplay kappa as a method. There has been a lot of work on reliability in the past few years. A few examples:
Laenen, A. et al. (2006). Generalized reliability estimation using repeated measurements. British Journal of Mathematical and Statistical Psychology, 59, 113-131.
Molenberghs, G. et al. (2007). Estimating Reliability and Generalizability from Hierarchical Biomedical Data. Journal of Biopharmaceutical Statistics, 17, 595-627

What you're interested may not be "reliability" per se. The GLMM approach of the second paper is quite conceptually attractive, but I'm told does not always work out well in practice for binary outcomes. If nothing else, trying to set up the problem in that framework will help clarify what you'd like to estimate / test and what sources of error and uncontrolled variation you need to think about.
posted by a robot made out of meat at 12:23 PM on April 13, 2011


« Older If only we could sharpen that image..   |   Rabies DB? Newer »
This thread is closed to new comments.