Normalize a survey on multiple variables
January 16, 2008 11:46 AM   Subscribe

Help me hivemind: What methodology should be used to weight survey sample data, using two weighting variables (for example age and gender), to reflect an actual population?

I know what my actual population demographics are from the population where I took my survey. My survey responses don't actually reflect the population. I want to weight the responses so that when I do the analysis, I am not exagerating certain segments.
posted by mtstover to Grab Bag (4 answers total) 1 user marked this as a favorite
 
Best answer: I'm not a stats expert but I work in market research - currently a front-end (survey) programmer, but used to work in DP/tabulation.. if I'm not mistaken, you can't really work with multiple weights to produce one set of results.

For instance you wouldn't want to have a set of weights for gender and a set of weights for age. You would want one weight to assign to each of your categories: Males 18-25, Females 50-65, etc.

Of course I could be wrong, but what I mention above is probably the easiest way to get it done. Hopefully you have the universe breaks at that level of detail (eg you know the percentage of population males 18-25). And this does get more complicated the further down you want to drill - african american females age 25-35 who make more than $40k a year and live in the Northeast... but I've never worked with a client that wanted things weighted to such a degree, either.
posted by MarkLark at 12:22 PM on January 16, 2008


Best answer: I agree with MarkLark that you need to have one weight, not two.

Say you have 100 people in your sample. The frequency of each group is shown below. The population percentage of each category (across both genders) is in parentheses:

Males under 18: 10 (15%)
Males 18-64: 40 (30%)
Males 65+: 10 (15%)

Females under 18: 5 (10%)
Females 18-64: 25 (20%)
Females 65+: 10 (10%)

If your sample reflected the population, you'd expect to have 10 females under 18, not 5. 10/5 = 2 is the weight.

Let's say you're asking people what kind of ice cream they like best, chocolate, strawberry, or vanilla. 4 out of your 5 females under 18 like strawberry. Applying the weight, it would be 8% of females under 18 for strawberry, because 4*2 =8.

Males 18-64 are overrepresented in the sample. 20, or half, of them like chocolate ice cream. But after you apply the weight, the number is 15, because 30/40 = 0.75, and 20* 0.75 = 15.
posted by desjardins at 12:53 PM on January 16, 2008


Response by poster: Super guys -- thanks!!!
posted by mtstover at 1:10 PM on January 16, 2008


I don't do this stuff normally, but here are two wild-assed guesses:

(1) Randomly resample whatever the "correct" number is out of the over-represented groups. Then do it again ten thousand or a hundred thousand times, so that you have a density of results that you can look at.

(2) Weight by the inverse of the probability of being sampled, which you should be able to figure out by comparing sample and population proportions.
posted by ROU_Xenophobe at 1:14 PM on January 16, 2008


« Older What are my options?   |   Green Tea health Newer »
This thread is closed to new comments.