Survey Design - should I stop a census with survey response-level
August 3, 2022 1:01 PM   Subscribe

I'm a little farther away from survey design than I used to be, but I'm seeing a massive swing on a survey delivered on a regular cadence (not just at delivery) and I vaguely remember a reason why and it had to do with the way the initial population is identified.

So, the survey goes out to a massive population of folks - the entire 'contact me' population and then gets a 2%-5% response back every month. So the selection is done as a census for eligible population, but the response is at a 2%-5% response.

From what I remember, the initial population should be shrunk to a smaller % of total customers that is representative and then we should still see a 2%-5% response back, but it will improve the actual fidelity of the data. The next month, a new sample population would be determined and we would expect a similar 2%-5% response, but because these were new people - it would be more accurate and stable than the total pop approach...

I'm trying to remember why. Can someone either point me to an article, or explain it to me like I should remember this?
posted by Nanukthedog to Education (2 answers total)
 
Best answer: Two things springs to mind. One is that *you* as the surveyor, want to be the one selecting the respondents, to the degree possible. That is what keeps the sample actually random and representative of the population as a whole. The closer you can get the responses to be from a random sample that *you picked* rather than a self-selected subsample of some kind, the better the sample will be - and the better it will represent the population as a whole.

Looking at it from the other end, if you send the survey to the entire population every time, then you will get "happy responders" - that is, people who will respond to the survey every time or most times, just because they like responding to things like this, and also "blockers" - people who have seen the survey every month for so long that they block you, send it to spam, whatever.

Both of these skew the population of responders - "happy responders" are over-represented and might even respond to every single survey (even though representing probably less than 1% of your overall population, and clearly a somewhat atypical subgroup) and "blockers" will never see your survey again, so they will be underrepresented.

By presenting the survey to a different randomly selected subgroup each time, you avoid both of these problems. The "happy responder" answers once, but then isn't even asked to respond again for a long time. And the "blockers" only get the very occasional request to participate, so the message is far less likely to be automatically blocked by spam filters, and also less likely to be discarded instantly as "I've seen this a million times before - delete!". End result is, they have at least some chance of seeing the message and responding instead of 0% chance.

Here is an article that talks about some of these principles. In the language of that article, you are making your responding group more towards a "probability based sample" and less towards a self-selected sample.
posted by flug at 6:11 PM on August 3, 2022 [4 favorites]


Note that if you're surveying a subset of your population each time, that shrinks down your sample size, and reduces your statistical strength. If the idea is to track (as a simple example) positive vs. negative opinion (let's assume it's around 50/50 split), if you have 100,000 people and 3% respond, that's 3,000 people. If you further break this population into 12 similar-sized groups for monthly analysis, that's 400 responses per group. This has a 95% CI of around 5%, so you need moderately big swings to assert that something has changed. And that's with 100K people in your population; if you have 20K instead (which is still a lot!), then you are down to 80 people per month, which has a 95% CI of 15%; your surveys could report 60% positive last month and 40% positive this month, and you can't say with a high degree of confidence that things got worse.

The other way -- diametrically opposed from what you're talking about -- is a panel survey, where you survey the same people every time; you could survey everybody initially and then just resurvey the people who replied the first time. This wouldn't have the shrinkage of subsamples and would have a large size, but the panel would/could be biased (and this could plausibly increase as people drop out). The good thing about a panel is that the uncertainty to composition doesn't change; if you're trying to track something over time, then changes are actually people changing their minds. The bad thing about a panel is that if you thought that just the people who respond to a single survey are biased, then people willing to form a panel are even more extreme.

And to use flug's term from their helpful response above, note that 'happy responders' may not actually be merely people who like surveys but are otherwise 'normal'. I'm a transportation engineer, and the cliche is that cyclists always answer surveys, at a much higher rate than others. Similarly, I've looked at a bunch of surveys of the general public about shared e-scooter (Bird, etc.) pilot programs and the people who answer the surveys are (or claim to be) disproportionately heavy users of the scooters; in one survey something like 40% of the respondents said they always wore a helmet (and a good percent of the remaining ones said they often or sometimes wore a helmet), while in-the-field counts put helmet usage at something like 5%. But there's other groups; if your survey area is political, you might get the extremists. If it's corporate, you're likely to get the people who are most pissed off (or who are most in love with your company).

Fundamentally, statistics assumes that you are sampling people at random, and the larger the sample the more confident. You probably aren't sampling at random with low response rates. Getting a subsample may get you better responses, closer to the ideal of random sampling, but it gets you further away from the confidence of a large sample. No free lunch, I'm afraid.
posted by Superilla at 3:44 PM on August 4, 2022 [1 favorite]


« Older My pup is probably fine, right?   |   How to keep the house clean, fairly:... Newer »
This thread is closed to new comments.