Cohen's kappa in R?
April 12, 2011 12:04 PM Subscribe
Statistics filter: Cohen's kappa and inter-rater reliability in R?
I have a set of data in which people gave a binary (yes or no) response to a series of prompts. I want to compare their consistency with one another, and the overall consistency of the group with my predicted responses to the prompts. I have a few questions about how to structure the statistical tests (usage in R specifically):
1. Is Cohen's kappa the right test to use for this sort of thing? (And is there a way to make all of the multiple comparisons simultaneously using Cohen's kappa or some other test, or do I have to compare each individual pair?)
2. What do I use as the input to cohen.kappa()? Is it just the matrix with the individual responses?
3. When I compare the group responses to my predictions, is there any way to take into account the variability in the group? The alternative of just choosing a cutoff point (e.g. more than 50% yes votes gives a yes) and comparing that with my predictions would give me a basic idea but doesn't seem like a particularly honest way of dealing with the data!
Thanks for your help!
I have a set of data in which people gave a binary (yes or no) response to a series of prompts. I want to compare their consistency with one another, and the overall consistency of the group with my predicted responses to the prompts. I have a few questions about how to structure the statistical tests (usage in R specifically):
1. Is Cohen's kappa the right test to use for this sort of thing? (And is there a way to make all of the multiple comparisons simultaneously using Cohen's kappa or some other test, or do I have to compare each individual pair?)
2. What do I use as the input to cohen.kappa()? Is it just the matrix with the individual responses?
3. When I compare the group responses to my predictions, is there any way to take into account the variability in the group? The alternative of just choosing a cutoff point (e.g. more than 50% yes votes gives a yes) and comparing that with my predictions would give me a basic idea but doesn't seem like a particularly honest way of dealing with the data!
Thanks for your help!
What's most appropriate depends on what (in detail) you're trying to find out, how the experiments were designed, etc. In general, the best idea is to look in your field's journals and see what people do with similar problems.
I tend to downplay kappa as a method. There has been a lot of work on reliability in the past few years. A few examples:
Laenen, A. et al. (2006). Generalized reliability estimation using repeated measurements. British Journal of Mathematical and Statistical Psychology, 59, 113-131.
Molenberghs, G. et al. (2007). Estimating Reliability and Generalizability from Hierarchical Biomedical Data. Journal of Biopharmaceutical Statistics, 17, 595-627
What you're interested may not be "reliability" per se. The GLMM approach of the second paper is quite conceptually attractive, but I'm told does not always work out well in practice for binary outcomes. If nothing else, trying to set up the problem in that framework will help clarify what you'd like to estimate / test and what sources of error and uncontrolled variation you need to think about.
posted by a robot made out of meat at 12:23 PM on April 13, 2011
I tend to downplay kappa as a method. There has been a lot of work on reliability in the past few years. A few examples:
Laenen, A. et al. (2006). Generalized reliability estimation using repeated measurements. British Journal of Mathematical and Statistical Psychology, 59, 113-131.
Molenberghs, G. et al. (2007). Estimating Reliability and Generalizability from Hierarchical Biomedical Data. Journal of Biopharmaceutical Statistics, 17, 595-627
What you're interested may not be "reliability" per se. The GLMM approach of the second paper is quite conceptually attractive, but I'm told does not always work out well in practice for binary outcomes. If nothing else, trying to set up the problem in that framework will help clarify what you'd like to estimate / test and what sources of error and uncontrolled variation you need to think about.
posted by a robot made out of meat at 12:23 PM on April 13, 2011
This thread is closed to new comments.
posted by Nomyte at 3:15 PM on April 12, 2011