# How do I combine survey data collected in two stages?July 29, 2009 8:51 PM   Subscribe

So I collected some firm-level survey data in two stages, from two different geographic locations, although from the same general populations (i.e. same SIC code in both locations). How do I go about making sure it's okay to combine them into one data set? My Google-fu, as well as my advisor and research methods books fail me. Not that it really matters, but I'm doing the analysis in Stata 9.
posted by pcward to Education (5 answers total) 1 user marked this as a favorite

Best answer: Basically, it isn't ever okay to combine groups. You'll need to perform all of your analyses with both your pooled and split samples and compare the results. In the end, if the results were always comparable, you just throw a line into your article that says, "All analyses were conducted with both split and pooled samples. No significant differences were found"... and that'll be good enough for most people.

Without knowing more about what you're trying to do, my first step would be to t-test the hell out of everything. Respondent demographics and every question. Whenever you do end up with statistically significantly different responses, you'll at least need to make a note of it and explain it to the best of your ability. Of course, when you're carpet bombing like that, you'll inevitably have statistically significant differences even if none really exist. If you're using a .05 threshold and end up with around 5% of tests being significant, then the two groups' responses are plausibly similar enough to be combined. Of course, that 5% estimate is from a theoretical distribution of its own...

Also, do some visual exploration of the two groups. Plot some histograms and box-and-whisker plots (depending on your preference). Can you visually distinguish the groups in an important way? Then you've got a problem. A problem which is actually another study waiting to happen, and so not really a problem at all.

I don't see this kind of thing get questioned too often, but being able to unleash a torrent of histograms, t-tests and split/pooled analyses on your doubters will really strengthen your position if someone does ask.
posted by McBearclaw at 10:20 PM on July 29, 2009

Sheesh. I hope somebody else posts so you don't have to rely entirely on me. That's a frightening prospect.
posted by McBearclaw at 5:58 AM on July 30, 2009

Best answer: I'm just a tad more liberal on this point that McBearclaw. His advice is sound, for sure, but may be infeasible if neither sample is really large enough to conduct meaningful analyses.

The first step, as he suggests, is to do an in-depth comparison of the two samples. Assuming you don't find much difference between them, then I would consider it acceptable to combine them. However, I would absolutely designate a variable that distinguishes the location and include it on your subsequent analyses, just to ensure there aren't some more subtle differences between the two samples.
posted by DrGail at 6:23 AM on July 30, 2009

Response by poster: Thank you both! I figured this was the general approach, but wanted to check my gut feeling on this. Hadn't thought to plot things first, which I should have remembered to do.

Sample A is too small to get much meaning out of on its own (N=36)
Sample B is adequate (N=113), but considering the analysis I'm ultimately trying to run (PLS) it really needs to be larger.

I'll carpet bomb the data and cross my fingers that I don't have to go back for more survey responses. Getting 113 data points was like pulling wisdom teeth.
posted by pcward at 10:34 AM on July 30, 2009

I agree with the good Dr. on this one. I didn't make the connection that firm-level data often means smaller sample sizes than I usually deal with; it would definitely be difficult to do (meaningful) split analyses with one group at 36.
posted by McBearclaw at 12:49 PM on July 30, 2009

« Older Questions from Thailand   |   The best of the best Newer »
This thread is closed to new comments.