Does size matter (on exams)?
December 2, 2008 10:13 AM   Subscribe

Is there a formula or process to determine how many questions should be on an exam? Given a test bank of "x" number of questions, what "y" number of randomly selected questions constitutes a valid sample?

I'm a trainer in industry and we assess people who take our courses using tests generated from a question bank. Can I just use a standard sample size calculator? If so, how does one choose a confidence interval? I'm happy with a confidence level of 95% or even 90%, but am not sure how I should pick a confidence interval.

Or is a sample size calculator not appropriate for this application?

Thanks for your help, Educational/Statistical MeFites!
posted by angiep to Education (8 answers total) 1 user marked this as a favorite
 
It would really help to know what subject you're testing for and what format you're using. If an exam is dozens of multiple choice math or logic questions yeah, doing statistical analysis could produce useful data. But if it's a law exam with two essays, statistical analysis is going to be entirely useless.

The idea that you can get statistical rigor out of educational testing is dubious. That hasn't stopped the educational establishment from doing it anyways--after all, if education isn't a science, what is it?--but common practice doesn't make bullshit smell any sweeter.
posted by valkyryn at 10:34 AM on December 2, 2008


Confidence interval on what. Probability of getting a question right? Score-if-they-had-taken-all-x-questions?

This requires actual thinking; I'd probably simulate it for an answer.
posted by a robot made out of meat at 10:44 AM on December 2, 2008


When I made exams for junior high/high school level, I structured tests like a tractor pull. I put in a fair amount of regurgitation and selection, some application, and then some analysis or extrapolation. If you got the regurgitation and application, you'd code a high C/low B. To get an A, you needed the analysis and extrapolation.

What I was doing was trying to follow Bloom's Taxonomy of Educational Objectives.

Your question bank is only going to get you through levels 1-3 in the taxonomy, and by most views is not a very good assessment tool.
posted by plinth at 10:56 AM on December 2, 2008


I don't think that thinking about this in those terms is useful. As the fleshy automaton notes, about the only thing you could get easily is extrapolating from a score on 20 questions to their unknown score if you gave them a test with all the questions.

Which would be goofy, because that would require having each exam be a simple random sample of the questions, which (unless the test bank is designed especially for simple random sampling) might result in asking essentially the same not-very-important question three times while there's no question at all on some important topic.
posted by ROU_Xenophobe at 10:59 AM on December 2, 2008


Response by poster: To clarify:

My industry is governed by an international standards association which requires we use a random sample of multiple choice questions from a recognized test bank. (Un)fortunately, it does not prescribe how many questions to ask, but does require that we report how we choose the sample and its size.

I'm not sure either what criteria to use to determine a confidence interval, which is why I'd hoped to get some guidance.

I know and use Bloom's taxonomy to design instruction and assessment that are more authentic than the minimums required by the standards association, but that association still wants to see worksites create and members write a multiple choice exam.

Thanks for your comments so far!
posted by angiep at 12:38 PM on December 2, 2008


Wouldn't it make more sense to ask the standards association for clarification about what properties they want the sample to have?
posted by ROU_Xenophobe at 12:58 PM on December 2, 2008


Response by poster: @ROU Xenophobe: You're right, that would be the way to go. They don't publish anything, so I think I'll give this my best shot and wait for their response. If they're bent out of shape about my method, they'll have to suggest one of their own.

However, before I went that route I thought I'd try this forum since my other research turned up nothing.

@ all responders: Thanks for taking the time to help!
posted by angiep at 1:11 PM on December 2, 2008


You say a confidence interval, but I assume that there must be some passing bar and confidence person is better than that? Do "hard" questions count for more? Do basic questions count for more (if assuring basic competence is the objective of the test)?

To address ROU's proposition that you'd have to do SRS (and that that would be dumb); I'd actually suggest a stratified sample. E.g. take 10 easy questions, 10 medium, 10 hard. Require crosstabs such that some questions get to all the the objectives in Bloom's. That is still a random sample but would be more meaningful.

Allocation and sample size of a stratified sample is a little more difficult; the go-to book in a class I took on this was "Sampling Techniques" by William G. Cochran. Then, you would estimate the Score-if-they-had-taken-all-x-questions (or a weighted version of the same thing) and check it is significantly higher than some arbitrary bar.
posted by a robot made out of meat at 4:54 PM on December 2, 2008


« Older Architectural Lighting Designers: how's the...   |   How can I track and log all the changes to a... Newer »
This thread is closed to new comments.