So I collected some firm-level survey data in two stages, from two different geographic locations, although from the same general populations (i.e. same SIC code in both locations). How do I go about making sure it's okay to combine them into one data set? My Google-fu, as well as my advisor and research methods books fail me. Not that it really matters, but I'm doing the analysis in Stata 9.
Best answer: Basically, it isn't ever okay to combine groups. You'll need to perform all of your analyses with both your pooled and split samples and compare the results. In the end, if the results were always comparable, you just throw a line into your article that says, "All analyses were conducted with both split and pooled samples. No significant differences were found"... and that'll be good enough for most people.

Without knowing more about what you're trying to do, my first step would be to t-test the hell out of everything. Respondent demographics and every question. Whenever you do end up with statistically significantly different responses, you'll at least need to make a note of it and explain it to the best of your ability. Of course, when you're carpet bombing like that, you'll inevitably have statistically significant differences even if none really exist. If you're using a .05 threshold and end up with around 5% of tests being significant, then the two groups' responses are plausibly similar enough to be combined. Of course, that 5% estimate is from a theoretical distribution of its own...

Also, do some visual exploration of the two groups. Plot some histograms and box-and-whisker plots (depending on your preference). Can you visually distinguish the groups in an important way? Then you've got a problem. A problem which is actually another study waiting to happen, and so not really a problem at all.

I don't see this kind of thing get questioned too often, but being able to unleash a torrent of histograms, t-tests and split/pooled analyses on your doubters will really strengthen your position if someone does ask.
Sheesh. I hope somebody else posts so you don't have to rely entirely on me. That's a frightening prospect.
Best answer: I'm just a tad more liberal on this point that McBearclaw. His advice is sound, for sure, but may be infeasible if neither sample is really large enough to conduct meaningful analyses.

The first step, as he suggests, is to do an in-depth comparison of the two samples. Assuming you don't find much difference between them, then I would consider it acceptable to combine them. However, I would absolutely designate a variable that distinguishes the location and include it on your subsequent analyses, just to ensure there aren't some more subtle differences between the two samples.
Response by poster: Thank you both! I figured this was the general approach, but wanted to check my gut feeling on this. Hadn't thought to plot things first, which I should have remembered to do.

Sample A is too small to get much meaning out of on its own (N=36)
Sample B is adequate (N=113), but considering the analysis I'm ultimately trying to run (PLS) it really needs to be larger.

I'll carpet bomb the data and cross my fingers that I don't have to go back for more survey responses. Getting 113 data points was like pulling wisdom teeth.
I agree with the good Dr. on this one. I didn't make the connection that firm-level data often means smaller sample sizes than I usually deal with; it would definitely be difficult to do (meaningful) split analyses with one group at 36.
