# Does chi-square work for this?

October 11, 2009 3:23 PM Subscribe

StatisticAnalysisFilter: I took (pretty close to) scientific observations of the general populace in a neighborhood for a few months (personal project, long story). I measured the number of people who had trait X (or did not have trait X) in two locations, A and B. Now, I want to test the statistical significance of these results. Is the chi-square test sufficient for this? Or is there a better option?

Effectively, I've observed that Trait X is much more common in location A than location B. But I want to let the numbers do the talking, of course, and see if this is a statistically significant difference, or if it is due to chance. Standard science stuff, but I want to make sure I'm doing this correctly, since I haven't done this in a while!

Effectively, I've observed that Trait X is much more common in location A than location B. But I want to let the numbers do the talking, of course, and see if this is a statistically significant difference, or if it is due to chance. Standard science stuff, but I want to make sure I'm doing this correctly, since I haven't done this in a while!

Best answer: I was coming in here to suggest the Fisher's Exact Test (especially if the number of individuals in one of your groups - XA, for instance - is small), but a robot made of meat beat me to it. Instead, I offer you this website that will calculate both the Fisher's Exact Test and chi square values from your 2x2 contingency table.

posted by pemberkins at 3:47 PM on October 11, 2009

posted by pemberkins at 3:47 PM on October 11, 2009

It sounds like you didn't take a random sample of any sort. Is there some other source of randomness in your observations?

If not, the whole concept of "statistical significance" is inapplicable.

posted by mikeand1 at 5:07 PM on October 11, 2009 [1 favorite]

If not, the whole concept of "statistical significance" is inapplicable.

posted by mikeand1 at 5:07 PM on October 11, 2009 [1 favorite]

It depends on why you're doing this. You say it's a personal project, which I assume means you're just curious about it, you're not sending the results anywhere, and no decisions of any importance hang on the result.

In which case: sure, if you know how to do a chi2, that's fine. Or a two-sample t-test would be fine. Whatever.

If this is academic/policy work and you're sending the results somewhere for review, then do whatever your discipline does with this sort of data.

If any important decision hangs on it, then hire a market research firm to do it right.

Depending on what the trait is, you're probably better off hitting the American Factfinder and pulling some related census data for Place A and Place B. Then you'd know whether more people in Place A had Trait X, as of 2000.

posted by ROU_Xenophobe at 6:43 PM on October 11, 2009

In which case: sure, if you know how to do a chi2, that's fine. Or a two-sample t-test would be fine. Whatever.

If this is academic/policy work and you're sending the results somewhere for review, then do whatever your discipline does with this sort of data.

If any important decision hangs on it, then hire a market research firm to do it right.

Depending on what the trait is, you're probably better off hitting the American Factfinder and pulling some related census data for Place A and Place B. Then you'd know whether more people in Place A had Trait X, as of 2000.

posted by ROU_Xenophobe at 6:43 PM on October 11, 2009

This is just gonna be a fun party trick or something, right? Because as the good people above have mentioned, your sampling techniques are not exactly rigorous. And knowing your sample population (not just what you have but what you're missing as well) is a large part of the battle in coming up with results that mean something. Perhaps you've heard of the expression garbage in, garbage out? And just because two things are more likely to occur together won't mean that one directly affects the other necessarily, so you're hopefully not gonna make any such claims.

posted by mandymanwasregistered at 8:11 PM on October 11, 2009

posted by mandymanwasregistered at 8:11 PM on October 11, 2009

Best answer: If you just want to compare the observed group A to the observed group B you don't need statistical significance testing at all. Simple descriptive stats should do.

If you want to extrapolate from the observed groups to all members of said groups you need significance testing. Look here.

Problems to be aware of:

Sampling issues - as has been pointed out your observations are probably far from random. Be aware that people do not stay in their areas so you observations are of people who merely happened to be in area A or area B when you observed and may not be from either area A or area B. A case in point is that you were area A and area B when making your observations.

Measurement issues - you are relying on your judgment of a trait. If it is clear cut this is not a big deal - your measurement errors should be small. If it is harder to judge then there is a strong chance you have a lot of measurement error and it is probably non-random.

A lot of people make rash decisions based on their observations that even seem to have statistical support for claims like Area A is full of fatties and Area B is full of fitties. A little more investigation reveals that the observations of Area A were made at a strip mall that had a Jenny Craig outlet and Area B was a strip mall with a Planet Superfitness and that fatties from area B went to the Jenny Craig store in area A while fitties from Area A went to the gym in Area B.

Be careful about putting any weight at all in your observations. Statistics alone does not make good science.

posted by srboisvert at 8:05 AM on October 12, 2009

If you want to extrapolate from the observed groups to all members of said groups you need significance testing. Look here.

Problems to be aware of:

Sampling issues - as has been pointed out your observations are probably far from random. Be aware that people do not stay in their areas so you observations are of people who merely happened to be in area A or area B when you observed and may not be from either area A or area B. A case in point is that you were area A and area B when making your observations.

Measurement issues - you are relying on your judgment of a trait. If it is clear cut this is not a big deal - your measurement errors should be small. If it is harder to judge then there is a strong chance you have a lot of measurement error and it is probably non-random.

A lot of people make rash decisions based on their observations that even seem to have statistical support for claims like Area A is full of fatties and Area B is full of fitties. A little more investigation reveals that the observations of Area A were made at a strip mall that had a Jenny Craig outlet and Area B was a strip mall with a Planet Superfitness and that fatties from area B went to the Jenny Craig store in area A while fitties from Area A went to the gym in Area B.

Be careful about putting any weight at all in your observations. Statistics alone does not make good science.

posted by srboisvert at 8:05 AM on October 12, 2009

Response by poster: Thanks for the help, all! For those of you assuming my sampling was bad, well, I don't know where you got that from, but you'll get a chance to poke holes in my unified theory of whatever when I write this up and post it to projects!

posted by MoreForMad at 4:35 PM on October 20, 2009

posted by MoreForMad at 4:35 PM on October 20, 2009

This thread is closed to new comments.

People nearby one another tend to be more similar than you'd expect; if you're trying to generalize to larger groups of people the effective sample size is smaller than you'd think. That may also be true with respect to how you observed people; they're still somehow a sample of the larger locations. How seriously I'd think about that depends on how seriously you intend to use the results.

Finally, if the sample isn't that big use fisher's exact test instead.

posted by a robot made out of meat at 3:37 PM on October 11, 2009