May 4, 2009 8:53 PM Subscribe

Statisticians, help me work some census cross-referencing magic.

I would like to know how to cross-reference towns using percentage of atheists and annual rainfall. I want to know the method so that I can substitute either aspect with real estate prices or violent crime or other statistics. How do you do this? I'm handy with math and a spreadsheet, just wasn't sure as to the most efficient way to make it happen. I'm also not as handy with the census website as with math and a spreadsheet.

Thanks, you darling mefites, you.
posted by letahl to Society & Culture (9 answers total) 3 users marked this as a favorite

I would like to know how to cross-reference towns using percentage of atheists and annual rainfall. I want to know the method so that I can substitute either aspect with real estate prices or violent crime or other statistics. How do you do this? I'm handy with math and a spreadsheet, just wasn't sure as to the most efficient way to make it happen. I'm also not as handy with the census website as with math and a spreadsheet.

Thanks, you darling mefites, you.

Gnumeric can do most of what Excel can do.

+1 CORREL.

Statistics for people who (think they) hate statistics: Excel Edition is a great book. I think you'd appreciate it.

posted by MesoFilter at 11:46 PM on May 4, 2009

+1 CORREL.

Statistics for people who (think they) hate statistics: Excel Edition is a great book. I think you'd appreciate it.

posted by MesoFilter at 11:46 PM on May 4, 2009

Promise me that if you don't already understand the concept of an ecological fallacy, that you will have some understanding of it before you start interpreting your results from aggregate data.

posted by mandymanwasregistered at 11:57 PM on May 4, 2009

posted by mandymanwasregistered at 11:57 PM on May 4, 2009

mandymanwasregistered makes a valid point, something I think is worth noting again: please be ethical with your statistics. It makes it harder for the rest of us when you draw ridiculous conclusions. Perhaps you should do some reading on Hypothesis Testing too (maybe not, this could all be for some personal project that doesn't need it)?

posted by teabag at 5:33 AM on May 5, 2009

posted by teabag at 5:33 AM on May 5, 2009

Do you already have the data sorted by "town" or is that the problem?

posted by a robot made out of meat at 5:39 AM on May 5, 2009

posted by a robot made out of meat at 5:39 AM on May 5, 2009

I think a bigger problem is how you are going to get information about religious preference at the level of small cities and towns. Generally the gold standard within the United States for making demographic comparisons on the basis of geography is the U.S. Census. Although alternative surveys exist that do question religious preference, they use random sampling across larger regions.

posted by KirkJobSluder at 7:09 AM on May 5, 2009

posted by KirkJobSluder at 7:09 AM on May 5, 2009

Seconding Kirk - getting this data isn't half the challenge of getting it all in comparable units, and then matching those units. Census data isn't so bad, because you can get it at a variety of levels using the American FactFinder tool (once you get to the table, there's a Print/Download option at the top).

And yeah, sure, Excel is a fine way to make this happen. But it is my duty to tout R at any possible chance. The learning curve is probably too steep for just basic correlation coefficients, but if you're getting into stats, it might be something to consider (e.g. it's free and used by real statisticians). Once you've got your data all together (let's say columns are city, heathens, rain,reale,violentcrime,nonviolentcrime), it's a breeze to do correlations (and to plot them to check for the non-linearities discussed above). Basically, this:

data<> attach(data)

cor(heathens,rain)

plot(heathens,rain)

cor(heathens,reale)

plot(heathens,reale)

cor(reale,violentcrime)

plot(violentcrime,nonviolentcrime)

...gets you a large number of correlations and plots, a lot faster than Excel.

On top of that, it's easy to move into regression. If you wanted to sort out the effects of violent and non-violent crime on real estate prices, for example, you'd use regression.

Okay, okay, that's crazy overkill for your question. I'll stop now. Please do look into the myriad ways that analyses like this can be totally, utterly messed up, starting with what other posters have mentioned. They are wise. They're suggest caution, I'm handing you a statistical shotgun... ah, well.

posted by McBearclaw at 9:06 AM on May 5, 2009

And yeah, sure, Excel is a fine way to make this happen. But it is my duty to tout R at any possible chance. The learning curve is probably too steep for just basic correlation coefficients, but if you're getting into stats, it might be something to consider (e.g. it's free and used by real statisticians). Once you've got your data all together (let's say columns are city, heathens, rain,reale,violentcrime,nonviolentcrime), it's a breeze to do correlations (and to plot them to check for the non-linearities discussed above). Basically, this:

data<> attach(data)

cor(heathens,rain)

plot(heathens,rain)

cor(heathens,reale)

plot(heathens,reale)

cor(reale,violentcrime)

plot(violentcrime,nonviolentcrime)

...gets you a large number of correlations and plots, a lot faster than Excel.

On top of that, it's easy to move into regression. If you wanted to sort out the effects of violent and non-violent crime on real estate prices, for example, you'd use regression.

Okay, okay, that's crazy overkill for your question. I'll stop now. Please do look into the myriad ways that analyses like this can be totally, utterly messed up, starting with what other posters have mentioned. They are wise. They're suggest caution, I'm handing you a statistical shotgun... ah, well.

posted by McBearclaw at 9:06 AM on May 5, 2009

Just to amplify what Kirk said, the US Census doesn't ask any questions about religion and almost certainly never will any time soon. You'd need to work with something like the Pew Religious Landscape Survey.

posted by DavidNYC at 10:37 PM on May 5, 2009

posted by DavidNYC at 10:37 PM on May 5, 2009

Thank you for the answers. The correlation method is complicated, but you might be right that's where I need to go. I was just trying to find nice towns with plenty of atheists and good gardening conditions. :) I will try what you guys say for like ten towns and see how it goes.

posted by letahl at 8:08 AM on May 6, 2009

posted by letahl at 8:08 AM on May 6, 2009

This thread is closed to new comments.

If so, and if your spreadsheet is Excel, check out the "@CORREL" function. Given two columns of data, it will calculate a correlation coefficient between them. This will be a number ranging somewhere from -1 to 1.

A value of 1 means that they are perfectly linearly correlated with each other, and they increase together or decrease together. For example, "1, 2, 3" and "10, 20, 30" have a correlation coefficient of 1.

A value of -1 means that they are perfectly linearly correlated with each other, and one increases while the other decreases. For example, "1, 2, 3" and "30, 20, 10" have a correlation coefficient of -1.

A value of zero means that they are not linearly correlated. "1, 2, 3" and "30, 20, 30".

Other values are... other. Things that are close to linearly correlated will have a c.c. of something close to 1 or -1: "1, 2, 3" and "10, 20, 29" have a c.c. of about 0.9995. Similarly, things that are close to completely uncorrelated will have a c.c. near zero: "1, 2, 3" and "29, 20, 30" have a c.c. of about 0.091.

However, there's an important caveat: This

onlymeasures linear correlation. Two sets of numbers with a c.c. of zero may be strongly related to each other; just not linearly.And there's an even more important caveat: Correlation does not imply causation. I assume, from the tenor of your question, that you're already well aware of this, but I figure it can't hurt to point it out.

Excel-related note: I think that @CORREL is not installed by default. You may have to install some extra feature of Excel; check the help file for details if you run into this problem.

If your spreadsheet is not Excel, it almost certainly supports some equivalent function. And in the unlikely event that it doesn't, the Wikipedia page I linked to gives enough information that you could roll your own.

posted by Flunkie at 9:18 PM on May 4, 2009