Survey Methods Filter: Weights when the sampling design is unknown
October 12, 2017 4:07 PM   Subscribe

I'm trying to construct weights to use for the analysis of survey data. I know the population demographics, and have already calculated post-stratification weights to account for unit non-response. Unfortunately, I don't have information about the sampling design or sampling frame. Is it OK to only use the post-stratification weights?

In the ideal world, the survey weights would account for: 1) the probability of selection, and 2) unit non-response. I've already created weights to adjust for unit non-response. As a result, the sample demographics now match the population demographics. But I can't adjust for the probability of selection because I don't know the sampling design. My best guess is that it was either a convenience sample or an attempt to survey the entire population (i.e., a census).

My hunch is that it's better to use the post-stratification weights than none at all so that the sample is demographically representative of the population. But I haven't been able to find a source to verify this. How problematic is it if the weights don't adjust for the probability of selection?

If it matters, the analyses I plan to use are pretty straightforward: descriptive statistics, t-tests, correlations, and maybe some chi-square tests.
posted by oiseau to Grab Bag (2 answers total) 2 users marked this as a favorite
 
Is there a reason you need to do anything but describe the sample you have? As someone who does stats in a field which may not be yours, I would rather have accurate descriptions of your sample than hand-wavey post-stratification. Can you add more information about how and why you adjusted for non-response? Saying "unit non-response" makes me think that this is clustered - is that correct?

If you are in fact an epi (just looked in your history!), I'm happy to discuss further!
posted by quadrilaterals at 5:29 PM on October 12, 2017


Response by poster: Can you add more information about how and why you adjusted for non-response?

I adjusted for non-response because the means for several key outcome variables differed notably for demographic groups (e.g., gender) that were over or underrepresented in the sample.

Saying "unit non-response" makes me think that this is clustered - is that correct?

The data is not clustered.
posted by oiseau at 5:51 PM on October 12, 2017


« Older Bitten by a Cockroach   |   Songfilter: dreamy female vocalist, something... Newer »
This thread is closed to new comments.