How is margin of error applied to an opinion poll?
March 2, 2008 6:06 AM   Subscribe

When a preference poll has a certain margin of error, what are the minimum/maximum results?

Take the following poll as an example:

Clinton 47%
Obama 43%
M.O.E: 4%

How is the margin of error applied to the prior percentages? Is that 4% either side (e.g: Clinton between 43% and 51%) or total (between 49% and 45%)? Or it the "gap" between the two - is a complete reversal (Clinton down 4, Obama up 4) within the M.O.E, or is it at most a movement of 2%+/- each to 45% with a maximum gap of 41% vs 49%?

More confusingly, in a poll like this:

Obama 46, Clinton 45, Undecided 8 (M.O.E: 4%)

How does it all work? (And why is a 21-yr-old Brit addicted to your convoluted primary process?!)
posted by so_necessary to Law & Government (7 answers total)
 
Wikipedia explains that the margin of error is the radius of a confidence interval, so it's really (47±3) % which is 43% to 51%.
posted by grouse at 6:26 AM on March 2, 2008


i always understood the margin of error to be applied to either side of the stat in question. thus: clinton at 47% with a moe of 4% means she could be polling at 43% or at 51%.
posted by thinkingwoman at 6:28 AM on March 2, 2008


There is no maximum gap. There are only probabilities about how likely the result is to differ from the true percentages. It's theoretically possible that obama is ahead 80-20, but the probability of that is very (very very very) low.
posted by mpls2 at 6:46 AM on March 2, 2008


(well, i suppose the maximum gap is 100 points, obviously)
posted by mpls2 at 6:49 AM on March 2, 2008


Basic answer:

Clinton 47%
Obama 43%
M.O.E: 4%

Means

Clinton 43-51
Obama 39-47
with 95% confidence that if you actually asked everyone in the relevant population the exact same question in the exact same survey, it would fall in that range.

Minimum/maximum:
The only true minimum/maximum you can get is that if 430 people said they liked Obama, at least 430 people like Obama. The rest is probabilities. The standard is 95% confidence. You can also do 90% confidence, which gets you smaller margins of error, or 99% which gets you bigger margins of error. If the true value is 99% Obama, you could still get a sample that said 99% Clinton... it would just be very, very, very unlikely.

Complication:
Opinions about Clinton and Obama aren't two totally independent things -- people who don't say "Clinton" are very likely to say "Obama." For your purposes, you don't need to worry about it, but be aware that "Clinton 43-51, Obama 39-47" is going to be a little bit off.

Background:
Imagine the relevant population -- say, Democrats in Texas, or Democrats nationwide, or likely Democratic voters, or whatever. There's the true answer -- what would we get if we could ask everyone the question? But obviously asking everyone is impractical, so we have to ask samples, and each sample isn't going to be perfectly 100% representative of the population.

So we could imagine, what if the true answer were 50% Clinton? What would we expect if we asked 1000 people? It turns out that 95% of our samples of 1000 will have Clinton percentages between 47 and 53%, just because of how the math about sampling works.

So, what if we don't know the population percentage? What if we're trying to find this out? Well, we know before we draw the sample of 1000 that there's a 95% chance that it will be within 3 percentage points of the true value... so we can be 95% confident after we draw the sample and look at the results that the true value is our sample value +/- 3%.
posted by ROU_Xenophobe at 8:48 AM on March 2, 2008


So, the Ohio primary is coming up soon. There are about 10 million people in Ohio, and the best answer we can get to who is leading in Ohio right now would be obtained by asking each and every one of the 10 million people who they favored in the primary. Or, a less accurate answer could be obtained by asking some small, randomly chosen subset of voters, and assuming that the result extends to the rest of the 10 million people. The key questions then, are how many people do we ask, and how do we express how accurate our estimate is?

Now, even if you sample people perfectly at random, you can see how purely by chance things can go astray. If you called 6 people, the most likely result would be that 3 favor Clinton, and 3 favor Obama. But, it would not be entirely unbelievable if the first 6 people you called all favored Obama. (Like flipping a coin, there would be a 1/32 chance that the first 6 voters you call all favor Clinton, or all favor Obama -- (1/2)^6 * 2 outcomes) But, if you asked 300 people, you would almost never ((1/2)^300) find that they all favor the same candidate, if the race is close.

If you asked 300 people who they favored, you might find:

BO: 142 HC: 158
BO: 139 HC: 161
BO: 155 HC: 145
BO: 150 HC: 150
BO: 147 HC: 153

and so on. As it turns out, the estimates will form a bell-shaped curve around the true mean. In other words, the "true" percentage obtained from all voters would be the most likely estimate, and estimates close to the true percentage will be more likely than estimates far from the true percentage.

The most intuitive description of how accurate the estimates are would be to plot a little bell-shaped curve on a bar graph of percentages. But, merely specifying the margin of error, which is defined as the range around the estimate in which we may be 95% certain that the true answer falls.

For a 4% margin of error, typically around 300 people will be polled. To increase the accuracy beyond that, the number of people that need to be polled rises dramatically.

From this page, we see that:
From February 21 - 25, Quinnipiac University surveyed 1,872 Pennsylvania voters with a margin of error of +/- 2.3 percentage points. The survey includes 506 likely Democratic primary voters with a margin of error of +/- 4.4 percentage points.
which shows how nearly 3.5x as many respondents were needed to halve the margin of error.

All of this so far describes things that are found in any 1st-year undergraduate statistics text. The more complicated issues with polling that we have taken for granted are, how do you perfectly randomly sample people? If you call people on the phone, you are biasing against people who dont have phones, and toward people who have more than one phone number. You are also biased against people in large families, because, what you are doing is randomly selecting a phone # first, and THEN selecting a person in the family. (statisticians call this selection bias)

Also, in the US, not that many people vote in the primaries. Some people don't vote. Others cant vote (children, ex-cons). Independents can't vote in Democratic or Republican primaries. So, often pollers will try to modify their procedures to make estimates for "likely" voters. This will make polling significantly less accurate.
posted by Maxwell_Smart at 9:00 AM on March 2, 2008


Short answer: the margin of error (or confidence interval) is applied to each answer. So that means that if you did this survey 1000 times, 95% of those times (or 950), the answers would fall within the following ranges:

Clinton 43-51
Obama 39-47
M.O.E: 4%

So this is what is meant when they say a statistical dead heat. If we redid the survey with the same methodology but a different sample, we could conceivably get a tie, or even Obama ahead. However, given that the reported difference is exactly the same as the margin of error, we can be a bit more confident that there are actually more people who support Clinton than support Obama.

Then, you get into the whole sampling issue, which has been a huge problem this cycle, but we'll leave that to Pollster.com
posted by lunasol at 10:52 AM on March 2, 2008


« Older Similar Melody   |   Explain me this hippy-science. Newer »
This thread is closed to new comments.