Does size matter (on exams)?
December 2, 2008 10:13 AM Subscribe
Is there a formula or process to determine how many questions should be on an exam? Given a test bank of "x" number of questions, what "y" number of randomly selected questions constitutes a valid sample?
I'm a trainer in industry and we assess people who take our courses using tests generated from a question bank. Can I just use a standard sample size calculator? If so, how does one choose a confidence interval? I'm happy with a confidence level of 95% or even 90%, but am not sure how I should pick a confidence interval.
Or is a sample size calculator not appropriate for this application?
Thanks for your help, Educational/Statistical MeFites!
I'm a trainer in industry and we assess people who take our courses using tests generated from a question bank. Can I just use a standard sample size calculator? If so, how does one choose a confidence interval? I'm happy with a confidence level of 95% or even 90%, but am not sure how I should pick a confidence interval.
Or is a sample size calculator not appropriate for this application?
Thanks for your help, Educational/Statistical MeFites!
Confidence interval on what. Probability of getting a question right? Score-if-they-had-taken-all-x-questions?
This requires actual thinking; I'd probably simulate it for an answer.
posted by a robot made out of meat at 10:44 AM on December 2, 2008
This requires actual thinking; I'd probably simulate it for an answer.
posted by a robot made out of meat at 10:44 AM on December 2, 2008
When I made exams for junior high/high school level, I structured tests like a tractor pull. I put in a fair amount of regurgitation and selection, some application, and then some analysis or extrapolation. If you got the regurgitation and application, you'd code a high C/low B. To get an A, you needed the analysis and extrapolation.
What I was doing was trying to follow Bloom's Taxonomy of Educational Objectives.
Your question bank is only going to get you through levels 1-3 in the taxonomy, and by most views is not a very good assessment tool.
posted by plinth at 10:56 AM on December 2, 2008
What I was doing was trying to follow Bloom's Taxonomy of Educational Objectives.
Your question bank is only going to get you through levels 1-3 in the taxonomy, and by most views is not a very good assessment tool.
posted by plinth at 10:56 AM on December 2, 2008
I don't think that thinking about this in those terms is useful. As the fleshy automaton notes, about the only thing you could get easily is extrapolating from a score on 20 questions to their unknown score if you gave them a test with all the questions.
Which would be goofy, because that would require having each exam be a simple random sample of the questions, which (unless the test bank is designed especially for simple random sampling) might result in asking essentially the same not-very-important question three times while there's no question at all on some important topic.
posted by ROU_Xenophobe at 10:59 AM on December 2, 2008
Which would be goofy, because that would require having each exam be a simple random sample of the questions, which (unless the test bank is designed especially for simple random sampling) might result in asking essentially the same not-very-important question three times while there's no question at all on some important topic.
posted by ROU_Xenophobe at 10:59 AM on December 2, 2008
Response by poster: To clarify:
My industry is governed by an international standards association which requires we use a random sample of multiple choice questions from a recognized test bank. (Un)fortunately, it does not prescribe how many questions to ask, but does require that we report how we choose the sample and its size.
I'm not sure either what criteria to use to determine a confidence interval, which is why I'd hoped to get some guidance.
I know and use Bloom's taxonomy to design instruction and assessment that are more authentic than the minimums required by the standards association, but that association still wants to see worksites create and members write a multiple choice exam.
Thanks for your comments so far!
posted by angiep at 12:38 PM on December 2, 2008
My industry is governed by an international standards association which requires we use a random sample of multiple choice questions from a recognized test bank. (Un)fortunately, it does not prescribe how many questions to ask, but does require that we report how we choose the sample and its size.
I'm not sure either what criteria to use to determine a confidence interval, which is why I'd hoped to get some guidance.
I know and use Bloom's taxonomy to design instruction and assessment that are more authentic than the minimums required by the standards association, but that association still wants to see worksites create and members write a multiple choice exam.
Thanks for your comments so far!
posted by angiep at 12:38 PM on December 2, 2008
Wouldn't it make more sense to ask the standards association for clarification about what properties they want the sample to have?
posted by ROU_Xenophobe at 12:58 PM on December 2, 2008
posted by ROU_Xenophobe at 12:58 PM on December 2, 2008
Response by poster: @ROU Xenophobe: You're right, that would be the way to go. They don't publish anything, so I think I'll give this my best shot and wait for their response. If they're bent out of shape about my method, they'll have to suggest one of their own.
However, before I went that route I thought I'd try this forum since my other research turned up nothing.
@ all responders: Thanks for taking the time to help!
posted by angiep at 1:11 PM on December 2, 2008
However, before I went that route I thought I'd try this forum since my other research turned up nothing.
@ all responders: Thanks for taking the time to help!
posted by angiep at 1:11 PM on December 2, 2008
You say a confidence interval, but I assume that there must be some passing bar and confidence person is better than that? Do "hard" questions count for more? Do basic questions count for more (if assuring basic competence is the objective of the test)?
To address ROU's proposition that you'd have to do SRS (and that that would be dumb); I'd actually suggest a stratified sample. E.g. take 10 easy questions, 10 medium, 10 hard. Require crosstabs such that some questions get to all the the objectives in Bloom's. That is still a random sample but would be more meaningful.
Allocation and sample size of a stratified sample is a little more difficult; the go-to book in a class I took on this was "Sampling Techniques" by William G. Cochran. Then, you would estimate the Score-if-they-had-taken-all-x-questions (or a weighted version of the same thing) and check it is significantly higher than some arbitrary bar.
posted by a robot made out of meat at 4:54 PM on December 2, 2008
To address ROU's proposition that you'd have to do SRS (and that that would be dumb); I'd actually suggest a stratified sample. E.g. take 10 easy questions, 10 medium, 10 hard. Require crosstabs such that some questions get to all the the objectives in Bloom's. That is still a random sample but would be more meaningful.
Allocation and sample size of a stratified sample is a little more difficult; the go-to book in a class I took on this was "Sampling Techniques" by William G. Cochran. Then, you would estimate the Score-if-they-had-taken-all-x-questions (or a weighted version of the same thing) and check it is significantly higher than some arbitrary bar.
posted by a robot made out of meat at 4:54 PM on December 2, 2008
« Older Architectural Lighting Designers: how's the... | How can I track and log all the changes to a... Newer »
This thread is closed to new comments.
The idea that you can get statistical rigor out of educational testing is dubious. That hasn't stopped the educational establishment from doing it anyways--after all, if education isn't a science, what is it?--but common practice doesn't make bullshit smell any sweeter.
posted by valkyryn at 10:34 AM on December 2, 2008