Question about polling accuracy
February 18, 2010 8:27 AM   Subscribe

Statistics/survey question. I was always under the impression that a survey that is sent to a bunch of people and produces a relatively small number of responses who answer on a voluntary basis will produce garbage results. True?

When I received such a survey by email and asked the surveyor what the validity of the survey was, I got back this response:

"Depending on the size of the population being involved in a study and based on the number of responses from that population, we can calculate a margin of error and say that within a specific statistical confidence interval that our results reflect the population. The more responses we get from our research population, the more accurate our results. Generally, 100 responses from a large population will result in a margin of error of +/- 10% to decrease the margin of error in half, the number has to be quadrupled – 400 will be +/- 5%, etc. "

Is this generally true? If I send out some large number of email surveys and get 100 answers back, is the claimed accuracy in the above response correct?
posted by Mr. Justice to Science & Nature (15 answers total)
 
On a very basic level, the margin of error does scale with the inverse square root number of participants in the survey. So, (100)^-1/2 = 10%, (400)^-1/2 = 5%.

But this completely ignores all other biases that may occur. Such as, who are the types of people who respond to this request? How have they chosen the participants? Are they reflective of the population they are studying? Adding in all those other uncertainties is where you get garbage in, garbage out.
posted by Tooty McTootsalot at 8:40 AM on February 18, 2010


This used to be seen as garbage but, at least in the political polling world, is getting more and more cred. I remember seeing a study which showed that political polls taken in this manner were as predictive as over the phone polls. The thing is, polling is always voluntary, and it's always done among a relatively small group. Internet polling isn't perfect, but neither are phone polls, especially given that the latter can't be done to cell phones.

As far as that specific claim, that's just the basic guidelines for calculating margin of error, and it's not specific to online polling.
posted by lunasol at 8:44 AM on February 18, 2010


It's not as simple as that, really, but it's the right idea. The margin of error is determined given the level of accuracy desired and is based upon the sample size and is a simple computation. However, and this is a big however, surveys are only as good as their sampling techniques and the thought put into their wording and possible framing effects; response rate is an important factor. Sample size is only one part of this; if you have a giant sample that is non-representative, it's worse than having a smaller, more representative sample. Obviously, different surveys are conducted for different purposes and these things matter more or less depending on one's end goal.

Is the goal of the survey to really get at the heart of "public opinion" on a topic? If so, sophisticated sampling techniques are required to ensure that the sample is truly representative of the population at large and that responses are not based on individuals with a motivation to voice an opinion on a particular topic that is near and dear to them. It is necessary to ensure that various groups are represented, that certain groups are not omitted or underrepresented due to the structure of the sample or due to polling procedures (see recent discussions as to whether or not phone polls undersample cell phone users). Is the goal of the survey to score political points? If so then sampling may be less important and the push may be on developing questions like "Some people say the Mayor eats babies. Isn't this true?"

Surveys that are conducted off-the-cuff through email and rely on responses from those motivated to do so are to be taken with a grain of salt. Surveys that are representative and attempt to offer generalizable results are expensive and time-consuming to conduct. Often, the sampling procedure will dictate which individuals are to be surveyed and interviewers will be required to try to reach them three or four times before substituting another respondent, as the secondary respondent is now less random than the first. Add that up over a thousand or more respondents, and systematic bias is possible.
posted by proj at 8:47 AM on February 18, 2010


The thing is, polling is always voluntary, and it's always done among a relatively small group. Internet polling isn't perfect, but neither are phone polls, especially given that the latter can't be done to cell phones.

Also, this is not true. I regularly work with datasets that involve thousands and thousands of users. "Relatively small group" is a canard, especially if the data are representative. Further, cell phones can be included in phone polls, if the polls are conducted using random-digit-dialing rather than relying on published phone numbers.
posted by proj at 8:48 AM on February 18, 2010


All polls are sent to a relatively large number of people and gather a relatively few respondents. About the only exception, if you consider it a poll, is the decennial Census.

The problem you're talking about is (among other names) response bias. Some people are more likely to agree to be surveyed than others.

On the one hand, it's not a problem. The survey still produces entirely valid results for its population.

On the other hand, its raw population is "adults who were willing to be surveyed," which is probably not the population of interest.

To deal with this -- to map from the population that was actually surveyed to the population of interest -- surveyers take their raw results and reweight them. Essentially, the more likely you were to answer, the less your responses count, and people who were unlikely to respond (but did respond) have their responses weighted "extra." Different firms or academic outfits have different ways of doing this, some will weight themselves and others will just give you estimated response probabilities for each respondent. IIRC, there are also ways to weight as you go by using screening questions at the beginning. But the overall method of dealing with it is pretty uncontroversial.

(Doing it with email poses more problems depending on what the population of interest is.)
posted by ROU_Xenophobe at 8:48 AM on February 18, 2010


Also, this is not true. I regularly work with datasets that involve thousands and thousands of users.

I could be misremembering -- I don't really do much with surveys anymore -- but ISTR that response rates for mass-public surveys are routinely down around or below 25%. If that's right, then whatever the survey, the number of people asked to participate is much larger than the number who actually agree to participate, and the pool of actual respondents is self-selected.
posted by ROU_Xenophobe at 8:52 AM on February 18, 2010


Yes, the point of contention there is not that the pool of actual respondents is self-selected, but that surveys always involve relatively small groups of people.
posted by proj at 8:53 AM on February 18, 2010


It's not always true. For some really large surveys, they choose subsets of the nonresponsive sample to re-question. For example, in a mailed survey, they might send a reminder to one group, a phone call to some of the nonresponsive part of THAT group, and a home visit to some of the nonresponsive part of THAT group. If there's little significant difference between the subgroups, then the small initial response rate is a lot less meaningful.

Remember, small response rate could mean several things-- selection bias (like a survey asking college students if they've had sex-- virgins are less likely to respond) or, just as likely, an annoyingly long or complicated survey.
posted by acidic at 8:59 AM on February 18, 2010


Keeping in mind that I am not a statics expert, it seems to me that a lot would depend on how the population was selected, the nature of the survey, and the kinds of people who choose to answer. If people can "self-select" whether to complete the survey, you're much more likely to get results that are skewed because only those who feel strongly about the issue (if it's an "issue" kind of survey) are likely to answer. I would want to know how they selected their population, first of all. And just HOW large their "large population" is.
posted by rhartong at 9:01 AM on February 18, 2010


They may be adjusting poll results to fit knwon demographics. If women tend to pick option A, and men tend to pick option B, and there are more male respondents than female, you can weight the female respondents greater to compensate. If you do this with sex, race, income, location, political party, etc., the theory goes you might get good data even from a not-very-random sample.
posted by miyabo at 9:10 AM on February 18, 2010


The researcher should take into account how people might self-select. Here's an extreme example: you're sending out surveys in English to a neighborhood that is mostly made up of immigrants. Your population will only consist of people who speak English and thus won't be representative of the neighborhood. Another example: you want to know how women in your state feel about issue X, but you pick county Y as representative of the state. However, county Y contains the state prison. In an academic paper, these potential problems should be discussed.

But if there's no reason to suspect that the sample is not representative, then as long as the sample size is sufficient, the results aren't "garbage."
posted by desjardins at 9:11 AM on February 18, 2010


It may turn out that likely voters are more likely to respond to a survey while unlikely voters won't bother.
posted by Obscure Reference at 9:47 AM on February 18, 2010


the point of contention there is not that the pool of actual respondents is self-selected, but that surveys always involve relatively small groups of people

Oh, I see.

In that case (to the OP) the size of the population is pretty much irrelevant. A random sample of 100 has almost exactly the same inferential power from a population of 100,000 as it does for a population of 100,000,000,000. In fact, for mathematical convenience most margin of error calculations assume the population is literally infinite.
posted by ROU_Xenophobe at 9:50 AM on February 18, 2010


the point of contention there is not that the pool of actual respondents is self-selected, but that surveys always involve relatively small groups of people.

I should have been more specific. I meant respondents, not the dataset.

But there's nothing in the question or the pollster's response to indicate that the poll was conducted in an "off-the-cuff" manner.
posted by lunasol at 10:34 AM on February 18, 2010


I'd really underline this bit from proj: However, and this is a big however, surveys are only as good as their sampling techniques...

So, what was the methdology of the survey in question? A survey of your employees is liable to be fairly good. A survey spammed to a million haphazardly collected e-mail addresses where 100 reply is likely to be garbage.
posted by zompist at 7:28 PM on February 18, 2010


« Older Where do I find a full-time telecommute/remote...   |   I love you, but your food makes me sick Newer »
This thread is closed to new comments.