Is it random?
December 4, 2007 6:05 PM
Do standardized tests (SAT, GRE, LSAT, etc.) use randomly sorted answer choices?
I don't know how to go about getting a straight answer from the makers of these tests.
When writing a question, the writers must write 4 incorrect and 1 correct answer. How is the order they are placed determined?
I had kinda assumed that on harder tests (GRE, GMAT, MCAT, LSAT) that the answers were at least sometimes arranged in a non-random (test-writer chosen) method. But in a discussion I've been having, everyone else thinks that's crazy.
If the answer order is chosen by people, in order to sometimes make the answers more difficult to choose (example: putting a superficially attractive incorrect answer before the correct choice), that makes it non-random, even if each answer letter comprises 20% of the answer choices. Right?
However, I don't really know if they do that, or if I just notice when they do, and discount it when they don't.
Anyplace I can find hard evidence?
I don't know how to go about getting a straight answer from the makers of these tests.
When writing a question, the writers must write 4 incorrect and 1 correct answer. How is the order they are placed determined?
I had kinda assumed that on harder tests (GRE, GMAT, MCAT, LSAT) that the answers were at least sometimes arranged in a non-random (test-writer chosen) method. But in a discussion I've been having, everyone else thinks that's crazy.
If the answer order is chosen by people, in order to sometimes make the answers more difficult to choose (example: putting a superficially attractive incorrect answer before the correct choice), that makes it non-random, even if each answer letter comprises 20% of the answer choices. Right?
However, I don't really know if they do that, or if I just notice when they do, and discount it when they don't.
Anyplace I can find hard evidence?
On SAT math questions with answers that are numbers, the answers are often ordered from smallest to largest number which is obviously non-random. I can't speak to the order of other answer types.
posted by DanielDManiel at 6:26 PM on December 4, 2007
posted by DanielDManiel at 6:26 PM on December 4, 2007
I worked at Kaplan as an MCAT tutor for a little while, and I seem to remember that they mentioned in their trainings that there were never more than 5 answers in a row of the same letter. Don't know if that's a proven fact or just a 99.9% of the time thing.
I don't think they can be totally random, given use of the "E) None of the above" answer types on all these tests. That's one answer that's ALWAYS given as the last choice.
posted by i less than three nsima at 6:28 PM on December 4, 2007
I don't think they can be totally random, given use of the "E) None of the above" answer types on all these tests. That's one answer that's ALWAYS given as the last choice.
posted by i less than three nsima at 6:28 PM on December 4, 2007
(I mean yeah choosing one answer is always going to give you better results than random guessing each time, but I mean more than that).
No, it's not. If the answers are evenly distributed between A-E, you have a 1/5 chance of guessing correctly regardless of whether you consistently choose one answer or pick one at random.
posted by spaceman_spiff at 6:32 PM on December 4, 2007
No, it's not. If the answers are evenly distributed between A-E, you have a 1/5 chance of guessing correctly regardless of whether you consistently choose one answer or pick one at random.
posted by spaceman_spiff at 6:32 PM on December 4, 2007
Well, one of the things these tests strive for is that high scores should reflect knowledge of the content area rather than knowledge of of how the tests are constructed. Such obvious games with answer order would tend to undermine the construct validity of that particular test question.
posted by KirkJobSluder at 6:35 PM on December 4, 2007
posted by KirkJobSluder at 6:35 PM on December 4, 2007
Well, one of the things these tests strive for is that high scores should reflect knowledge of the content area rather than knowledge of of how the tests are constructed. Such obvious games with answer order would tend to undermine the construct validity of that particular test question.
Unless one of the content areas is critical reading and/or attention to detail. A very slight variation from a straight 20% for A-E would give the test makers opportunity to introduce tricker answer structures.
posted by bluejayk at 6:45 PM on December 4, 2007
Unless one of the content areas is critical reading and/or attention to detail. A very slight variation from a straight 20% for A-E would give the test makers opportunity to introduce tricker answer structures.
Except it isn't one of the content areas. Reading comprehension is as close as you get to that, and the essay questions which is not what you were asking about. There is really nothing tricky about ETS. I get the point that you are making, but ETS is a huge lumbering monolithic institution with a good-for-life government contract (thanks No Child Left Behind!) that they want to keep. They don't do anything even vaguely tricky
posted by jessamyn at 6:56 PM on December 4, 2007
Except it isn't one of the content areas. Reading comprehension is as close as you get to that, and the essay questions which is not what you were asking about. There is really nothing tricky about ETS. I get the point that you are making, but ETS is a huge lumbering monolithic institution with a good-for-life government contract (thanks No Child Left Behind!) that they want to keep. They don't do anything even vaguely tricky
posted by jessamyn at 6:56 PM on December 4, 2007
http://www.powerscore.com/lsat/help/guessing.htm
This site suggests that on the LSAT, the distribution is slightly skewed towards D & E (22.2% and 21.4%, respectively) for the last 5 answers on the test. The sample size is 1,140 questions. It's been about 8 years since I took statistics. So I can't figure out what deviation would be expected, but 2.2% seems high to my non-statistician brain.
posted by bluejayk at 7:00 PM on December 4, 2007
This site suggests that on the LSAT, the distribution is slightly skewed towards D & E (22.2% and 21.4%, respectively) for the last 5 answers on the test. The sample size is 1,140 questions. It's been about 8 years since I took statistics. So I can't figure out what deviation would be expected, but 2.2% seems high to my non-statistician brain.
posted by bluejayk at 7:00 PM on December 4, 2007
I can't believe I'm actually doing this.
I don't recall how to handle both D & E at the same time, but my tired brain is telling me it shouldn't matter much, if at all, at the moment, so, just doing it for E:
Let theta be 1/5, or 0.2, chance of the answer being E, assuming a random distribution. The sample theta is .222. Given a sample size of 1140, and assuming theta = 0.2, the mean of the sample mean distribution (I don't recall what n = 5 does to that.. I don't think it would matter, though) = 0.2 and sd = sqrt((.2*(1-.2) / (1139)). The z-score of .222 would be 1.856, giving a p-value of 0.0634. That'd suggest that it's unlikely, but perhaps not sufficiently so to believe it was biased.
Looking back over that, something about that process looks a bit off. It'd be terrific if someone who actually understood stats would check me on that, though.
posted by devilsbrigade at 1:34 AM on December 5, 2007
I don't recall how to handle both D & E at the same time, but my tired brain is telling me it shouldn't matter much, if at all, at the moment, so, just doing it for E:
Let theta be 1/5, or 0.2, chance of the answer being E, assuming a random distribution. The sample theta is .222. Given a sample size of 1140, and assuming theta = 0.2, the mean of the sample mean distribution (I don't recall what n = 5 does to that.. I don't think it would matter, though) = 0.2 and sd = sqrt((.2*(1-.2) / (1139)). The z-score of .222 would be 1.856, giving a p-value of 0.0634. That'd suggest that it's unlikely, but perhaps not sufficiently so to believe it was biased.
Looking back over that, something about that process looks a bit off. It'd be terrific if someone who actually understood stats would check me on that, though.
posted by devilsbrigade at 1:34 AM on December 5, 2007
Also, according to Powerscore:
Unlike the SAT, the LSAT often has three identical answer choices consecutively (such as three "D’s"), and on several occasions, four identical answer choices in a row have appeared. On the June 1996 LSAT, it even occurred that six of seven answer choices in one section were "C." The use of multiple answer choices in a row is one of the psychological weapons employed by the LSAT to unnerve test takers. Any test taker seeing four "D’s" in a row on their answer sheet understandably thinks they have made some type of error, primarily because most tests avoid repetition in their answer choices.posted by ewiar at 7:46 AM on December 5, 2007
bluejayk: Unless one of the content areas is critical reading and/or attention to detail. A very slight variation from a straight 20% for A-E would give the test makers opportunity to introduce tricker answer structures.
In which case, you introduce a separate section of the text to probe for critical reading and detail reading options.
Ok, basic test construction 101. Validity is how closely the test questions hit some theoretical target. So for SAT Math, the target is nationally defined High School math standards objectives (such as "identify the slope of a linear equation.") So you want questions that address items on the Math standards and not questions that address items on the Social Studies standards. Reliability is how closely the test items relate to each other. Similar questions about slope should have similar answer rates.
Think about it, let's say that there is a test item that can be correctly answered either by identifying the slope of a linear equation or by critical reading. You've just reduced the validity of the question because you can't say which competency you tested. The person could be competent at math and poor at critical reading. The person could be competent at critical reading and poor at math.
There are a lot of things to criticize about ETS, but they come from a very formal and old-school tradition of standards testing which takes validity and reliability seriously.
posted by KirkJobSluder at 7:52 AM on December 5, 2007
In which case, you introduce a separate section of the text to probe for critical reading and detail reading options.
Ok, basic test construction 101. Validity is how closely the test questions hit some theoretical target. So for SAT Math, the target is nationally defined High School math standards objectives (such as "identify the slope of a linear equation.") So you want questions that address items on the Math standards and not questions that address items on the Social Studies standards. Reliability is how closely the test items relate to each other. Similar questions about slope should have similar answer rates.
Think about it, let's say that there is a test item that can be correctly answered either by identifying the slope of a linear equation or by critical reading. You've just reduced the validity of the question because you can't say which competency you tested. The person could be competent at math and poor at critical reading. The person could be competent at critical reading and poor at math.
There are a lot of things to criticize about ETS, but they come from a very formal and old-school tradition of standards testing which takes validity and reliability seriously.
posted by KirkJobSluder at 7:52 AM on December 5, 2007
And another thing. I know the GRE is all computer adaptive testing in which questions are selected from a pool based on correct/incorrect responses to early questions. So there isn't a fixed order.
posted by KirkJobSluder at 8:07 AM on December 5, 2007
posted by KirkJobSluder at 8:07 AM on December 5, 2007
My mother was once given a test (not a major standardized test, but a test in school) where the answers were a, b, c, d, a, b, c, d..... There was a lot of erasing, and a large number of poor scores.
Aside from that, I remember my SAT scan paper looking very uniformly distributed, and it wasn't because I missed a lot of answers.
posted by anaelith at 9:12 AM on December 5, 2007
Aside from that, I remember my SAT scan paper looking very uniformly distributed, and it wasn't because I missed a lot of answers.
posted by anaelith at 9:12 AM on December 5, 2007
I sometimes give multiple choice tests where all the answers are the same. It really freaks out a number of the students.
posted by Wet Spot at 2:01 PM on December 6, 2007
posted by Wet Spot at 2:01 PM on December 6, 2007
This thread is closed to new comments.
When I worked for the Priceton Review a zillion years ago, there was an office anecdote about how they used to tell people when they didn't know an answer to fill in C because it was statistically slightly more likely to be right than something random (I mean yeah choosing one answer is always going to give you better results than random guessing each time, but I mean more than that). Then that trick no longer worked.
So, ETS won't tell you. I worked for them for a while as a test scorer and they always said it's random. To figure out for sure you need to talk to someone who has been following them around forever. The guy I like to talk to about this stuff is Jay Rosner who is (was?) counsel for TPR and knows a ton of stuff about ETS and the way they operate and standardized testing generally. You can find his name and email on this page. He is a friend of mine, if you email him tell him I said hi.
posted by jessamyn at 6:17 PM on December 4, 2007