Finding margin of sampling error in a survey report
March 1, 2021 6:54 AM Subscribe
I am trying to find the margin of sampling error in a report about a survey. I can't find the relevant verbal phrases anywhere in the report. But is that what is represented by the percentage figures on Page 48 of the report?
If not, can you help me find margin somehow? I just need it for the national nonvoter sample.
If not, can you help me find margin somehow? I just need it for the national nonvoter sample.
Obligatory note: Depending on what you're doing and what the stakes are, you might need/want to do something more complicated than "They reported 75%, so it's 75% +/- 1.5%."
posted by GCU Sweet and Full of Grace at 7:29 AM on March 1, 2021
posted by GCU Sweet and Full of Grace at 7:29 AM on March 1, 2021
Best answer: This might be too much:
. variability (noise) in a yes/no binary sample is a function of how likely the responses are. Technically, the variance of a single one of these binary responses is (the probability you get one thing) * (the probability you get the other thing).
. .for example, the worst case scenario is for a coin toss. 50% this and 50% that, you get a variance of 50% * 50% = 0.25.
. . if, by contrast, you knew that something was 99.44% (and hence 0.56% that), the variance would be markedly lower. (Basically .0056, if you're keeping score.)
. . to be most conservative, you'd stick with that 0.25 number unless you had a good reason to think otherwise.
So that's the variance of a single response.
. The point of a survey is to get the average of a relevant sample of responses, so you need the formula for the variance of an average.
. That's just the variance of a single response divided by the number of responses.
. So even in the worst case scenario of variance = 0.25, the variance of the average response of (say) 4,002 respondents is just 0.25/4002. = 0.00006 if you're following along on a pocket calculator.
A confidence interval for the average of average of a relevant sample of responses is going to be that average +/- the product of:
. some z score that you pull out of a table in the back of your textbook, or out of some software package times
. the square root of that variance. So sqrt(.00006) = 0.0079 or so, and the z value technically depends on other assumptions but everybody always just uses 2. (technically 1.96, for reasons.)
SO:
if we have no reason to believe the response we're measuring will be markedly different from 0.5, and
if we take a random draw of 4,002 national non-voters (who actually respond to the survey with something useful), and
if we want the margin of error for a (bog-standard) 95% confidence interval around the response,
then margin-of-error = z*sqrt(.25/4,002) = 1.96* 0.0079 = 0.01549 which is the 1.55% they report in the document.
If you wind up looking further down the page, there's some weirdness about the swing states where they took samples of 800 in 10 different states, but they report the MOE associated with a single sample of 800, rather than all 10*800 = 8,000 people in one big bucket. But all of the numbers look like what you'd get if you stuck their parameters into a pretty typical, pretty robust confidence interval/margin-of-error calculator in a general-purpose undergrad statistics textbook.
so, what they said^^^^^^^^^^^^^^^^
posted by adekllny at 4:21 PM on March 1, 2021
. variability (noise) in a yes/no binary sample is a function of how likely the responses are. Technically, the variance of a single one of these binary responses is (the probability you get one thing) * (the probability you get the other thing).
. .for example, the worst case scenario is for a coin toss. 50% this and 50% that, you get a variance of 50% * 50% = 0.25.
. . if, by contrast, you knew that something was 99.44% (and hence 0.56% that), the variance would be markedly lower. (Basically .0056, if you're keeping score.)
. . to be most conservative, you'd stick with that 0.25 number unless you had a good reason to think otherwise.
So that's the variance of a single response.
. The point of a survey is to get the average of a relevant sample of responses, so you need the formula for the variance of an average.
. That's just the variance of a single response divided by the number of responses.
. So even in the worst case scenario of variance = 0.25, the variance of the average response of (say) 4,002 respondents is just 0.25/4002. = 0.00006 if you're following along on a pocket calculator.
A confidence interval for the average of average of a relevant sample of responses is going to be that average +/- the product of:
. some z score that you pull out of a table in the back of your textbook, or out of some software package times
. the square root of that variance. So sqrt(.00006) = 0.0079 or so, and the z value technically depends on other assumptions but everybody always just uses 2. (technically 1.96, for reasons.)
SO:
if we have no reason to believe the response we're measuring will be markedly different from 0.5, and
if we take a random draw of 4,002 national non-voters (who actually respond to the survey with something useful), and
if we want the margin of error for a (bog-standard) 95% confidence interval around the response,
then margin-of-error = z*sqrt(.25/4,002) = 1.96* 0.0079 = 0.01549 which is the 1.55% they report in the document.
If you wind up looking further down the page, there's some weirdness about the swing states where they took samples of 800 in 10 different states, but they report the MOE associated with a single sample of 800, rather than all 10*800 = 8,000 people in one big bucket. But all of the numbers look like what you'd get if you stuck their parameters into a pretty typical, pretty robust confidence interval/margin-of-error calculator in a general-purpose undergrad statistics textbook.
so, what they said^^^^^^^^^^^^^^^^
posted by adekllny at 4:21 PM on March 1, 2021
« Older How do I get an orthodontist to stop ghosting us? | Lowest hassle (and safest) way to get rid of a... Newer »
This thread is closed to new comments.
posted by NotMyselfRightNow at 6:59 AM on March 1, 2021