Bad study design? or no?
November 22, 2007 11:26 AM
Is this study an example of bad statistical study design?
In the report (page 21, page 37 of the PDF), the method is stated as:
The survey was conducted using face to face interviews in respondents’ homes. A random location sampling approach was used: 379 areas were randomly selected, each area containing around 300 addresses and interviewers were asked to obtain nine or ten interviews from each area. Quotas were set based on age, gender and working status to ensure a representative sample was achieved. Individuals who took part in the survey were given a £5 high street voucher as a thank you for completing the interview.
Will these quotas lead to bias in the resulting statistics?
In the report (page 21, page 37 of the PDF), the method is stated as:
The survey was conducted using face to face interviews in respondents’ homes. A random location sampling approach was used: 379 areas were randomly selected, each area containing around 300 addresses and interviewers were asked to obtain nine or ten interviews from each area. Quotas were set based on age, gender and working status to ensure a representative sample was achieved. Individuals who took part in the survey were given a £5 high street voucher as a thank you for completing the interview.
Will these quotas lead to bias in the resulting statistics?
I'm definitely no stats expert, but I've taken a few social science oriented stats classes. Here is a page from the introduction to psychology textbook I used back in my freshman year where the author explains that a random sample and a representative sample should be the same thing. The whole reason you do a random sample is so that you have the greatest probability of being able to generalize to the population as a whole. If a random sample turns out to be all female or all male or something similar, in all likelihood you will spend time gathering a new sample rather than use one that is not representative or generalizable to the populations as a whole.
So, to come back to your original question, I'm pretty sure quotas are a standard procedure in most studies. They ensure that the results of the study are somewhat valid for the rest of the population.
More from here
The goal of randomization is to produce comparable groups in terms of general participant characteristics, such as age or gender, and other key factors that affect the probable course the disease would take. In this way, the two groups are as similar as possible at the start of the study. At the end of the study, if one group has a better outcome than the other, the investigators will be able to conclude with some confidence that one intervention is better than the other.
posted by jourman2 at 11:43 AM on November 22, 2007
So, to come back to your original question, I'm pretty sure quotas are a standard procedure in most studies. They ensure that the results of the study are somewhat valid for the rest of the population.
More from here
The goal of randomization is to produce comparable groups in terms of general participant characteristics, such as age or gender, and other key factors that affect the probable course the disease would take. In this way, the two groups are as similar as possible at the start of the study. At the end of the study, if one group has a better outcome than the other, the investigators will be able to conclude with some confidence that one intervention is better than the other.
posted by jourman2 at 11:43 AM on November 22, 2007
It looks like they are using the quotas to make sure their sample reflects the demographics of the population, rather than the demographics of people who like to do surveys.
Seems fine to me.
posted by thrako at 12:08 PM on November 22, 2007
Seems fine to me.
posted by thrako at 12:08 PM on November 22, 2007
So you have some question you want to ask of the whole population. You think responses might be correlated with various demographic data. What do you do?
The statisticians would say: just take a random sample. If you get enough people, everything will "average out". But there's non-response bias: Not everyone you ask will agree to respond to your survey, not even for a 5-pound high street voucher. What if people who don't respond are more likely to have different views?
when you're doing face-to-face questioning rather than telephone questioning, doing a truly random sample (i.e. pick some number of people from around the country) would be prohibitively expensive, just because of travel costs. But face-to-face questioning yields much higher response rates (phone is only IIRC ~25%), so there's less non-response bias. So what you do is called "cluster sampling", which is what they're doing here.
This means that you get a large sample size and a high response rate, but since you're only sampling 380 areas, it's less likely that your however-many-thousand people will all come from different backgrounds. How do you make sure your sample has the requisite "breadth"?
There are two options, which both amount to the same thing: the first is to weight the responses of people from underrepresented groups so that it's as if you had more of them; the second is to actually make sure you have a good demographic cross-section, which is what they're doing. It's not perfect, but it's a remedy for one of the deficits of cluster sampling (as well as surveying in general) and I believe that it has been shown to produce generally more accurate results.
posted by goingonit at 12:09 PM on November 22, 2007
The statisticians would say: just take a random sample. If you get enough people, everything will "average out". But there's non-response bias: Not everyone you ask will agree to respond to your survey, not even for a 5-pound high street voucher. What if people who don't respond are more likely to have different views?
when you're doing face-to-face questioning rather than telephone questioning, doing a truly random sample (i.e. pick some number of people from around the country) would be prohibitively expensive, just because of travel costs. But face-to-face questioning yields much higher response rates (phone is only IIRC ~25%), so there's less non-response bias. So what you do is called "cluster sampling", which is what they're doing here.
This means that you get a large sample size and a high response rate, but since you're only sampling 380 areas, it's less likely that your however-many-thousand people will all come from different backgrounds. How do you make sure your sample has the requisite "breadth"?
There are two options, which both amount to the same thing: the first is to weight the responses of people from underrepresented groups so that it's as if you had more of them; the second is to actually make sure you have a good demographic cross-section, which is what they're doing. It's not perfect, but it's a remedy for one of the deficits of cluster sampling (as well as surveying in general) and I believe that it has been shown to produce generally more accurate results.
posted by goingonit at 12:09 PM on November 22, 2007
No, the quotas are there to counteract selection bias. The Wikipedia page on sampling describes quota sampling and notes that pure random sampling is often impractical, so a common method is for interviewers to draw candidates from a large set, constrained by quotas on relevant characteristics. We could debate the merits of different sampling techniques, but I don't think there's anything out of the ordinary about the DEFRA survey.
posted by matthewr at 12:11 PM on November 22, 2007
posted by matthewr at 12:11 PM on November 22, 2007
I suspect the quotas are an attempt to remove some bias from the sampling method. An in-home interview, even if you select your target homes randomly, is biased towards 'people who are home' -- which will tend more to women (because of stay at home moms), old people (because they're retired) and the unemployed. The quotas directly counteract those inherent biases.
posted by jacquilynne at 12:12 PM on November 22, 2007
posted by jacquilynne at 12:12 PM on November 22, 2007
What they're doing is called a "stratified sampling".
It's not inherently bad, but it can be subject to conscious or unconscious manipulation to sway the result.
posted by Steven C. Den Beste at 12:55 PM on November 22, 2007
It's not inherently bad, but it can be subject to conscious or unconscious manipulation to sway the result.
posted by Steven C. Den Beste at 12:55 PM on November 22, 2007
My friend asserts that as an interviewer attempts to fill their quota, they'll over-represent parties who are interested in the subject of the study because they'll have to search harder to find X number of respondents, whereas if they just did X random survey attempts and got <X responses, but weighted the responses by the general trends in the population, interested groups would not likewise be overrepresented.
posted by beerbajay at 12:59 PM on November 22, 2007
posted by beerbajay at 12:59 PM on November 22, 2007
Steven C. Den Beste and goingonit:
After reading up on cluster sampling vs. stratified sampling it appears that this study is not actually using cluster sampling since it is not the clusters themselves which are being studied, rather the individual responses are considered. So best answers to the two of you.
posted by beerbajay at 12:21 AM on November 24, 2007
After reading up on cluster sampling vs. stratified sampling it appears that this study is not actually using cluster sampling since it is not the clusters themselves which are being studied, rather the individual responses are considered. So best answers to the two of you.
posted by beerbajay at 12:21 AM on November 24, 2007
« Older Which is screwed up, my DVD burner or my laptop? | How do I restore the missing factory display... Newer »
This thread is closed to new comments.
posted by beerbajay at 11:37 AM on November 22, 2007