Programs to work with Census data?
October 30, 2009 8:51 AM   Subscribe

How do I create a cross-tab from Census data (SF3)?

I would like to create a cross tab that shows disability status by household (or family) income at the census tract level of detail. I cannot get that table from American Factfinder. Saw the link that said "download the data and you can generate your own cross tabs". Downloaded data - got big old text (ascii) file that does not easily import into excel or access (the only two potential programs I have now). Now I have big old text file and shattered hopes.

I think I am looking for a (preferably free or shareware) program that will open and analyze this data with a minimum of fussing on my part.

Or, maybe I am looking for something completely different. You'd propbably know better than me.

Thanks in advance.
posted by qldaddy to Computers & Internet (14 answers total) 2 users marked this as a favorite
 
Would Statcrunch fit the bill? I'm not familiar with it but it might be worth a try, since (I'm assuming) you don't have access to SPSS or SAS or anything...non free.
posted by chesty_a_arthur at 9:07 AM on October 30, 2009


FYI: Not free, looks like $12.
posted by chesty_a_arthur at 9:08 AM on October 30, 2009


Best answer: SPSS (the software is now called PASW) has a 21 day free trial.
posted by desjardins at 9:11 AM on October 30, 2009


What's the link that tells you you can create your own crosstabs?

I don't think they mean what you think they mean.

To generate the crosstab of disability status by income within a census tract, you'd need to have access to the individual-level data. Which you will never get.

What you could do with the SF3 data is do a crosstab by tract, showing that tracts with higher average family income had a bigger/smaller/the same number of disabled people as tracts with lower average family income. But that would not tell you much of anything about how disability status and family income are related at the individual level.

You can do that from the American Factfinder. Pick all census tracts within some county, then select a family/household income table and one of the disability status tables, then use that information to generate an average family/hh income for the tract and put that one one axis with # of disabled people on the other.
posted by ROU_Xenophobe at 9:23 AM on October 30, 2009


I don't know of any free way to do this, but I think you might be able to do it using kexi and the spatial part in QGIS. I'm no expert on either of these, though.

You can use MS access to do this, if you are interested in a non-free method. I started this, and stopped as a found an easier way (my university has access to geolytics which has a pre-built database. If you just need this, send me the exact codes (Pxxxx etc) and headings, and I can run the query for you and send you the file.

This is not a huge amount of use without the spatial component though. Are you using a GIS program? I have some other tips if you are.
posted by a womble is an active kind of sloth at 9:26 AM on October 30, 2009


Sorry, in my haste I saw that you had MS Access. The Census people do this weird thing with files, where they leave off the .txt. So you can change the filename to be xxxxx.txt and then import it into Access.

(Should have read question more carefully!)
posted by a womble is an active kind of sloth at 9:32 AM on October 30, 2009


Response by poster: To generate the crosstab of disability status by income within a census tract, you'd need to have access to the individual-level data. Which you will never get.

I am looking at the 1% and 5% Public Use Microdata Sample (PUMS). On re-reading, I see it says "These files enable users to produce their own tabulations within the limits of the data provided." So I may be being overoptimistic as to what I can do although I would still like to try.

Pick all census tracts within some county, then select a family/household income table and one of the disability status tables, then use that information to generate an average family/hh income for the tract and put that one one axis with # of disabled people on the other. I did grab up this date but could not think of how to use them. Thanks for this suggestion.

Tried exporting into Access (and excel), but the files are not CSV. I have the technical documentation, but it looks to me like inserting the dta breaks by hand will be a long and likely error-prone process. I was hoping for a less labor-intensive process utilizing a platform that already knows what to do with census data.

Census site says that one can purchase data on CD packaged with Beyond 20/20/software. (although my local Census office has never heard of this option) Anyone have any experience with this software?
posted by qldaddy at 9:54 AM on October 30, 2009


Could you link to this file somehow? You could link to the Factfinder query that produced, or just throw the file itself up on something like drop.io.
posted by McBearclaw at 10:56 AM on October 30, 2009


Response by poster: File that I'd like to play with is located on the Census site, here.

The zip file has all the data.
posted by qldaddy at 11:03 AM on October 30, 2009


Best answer: I am looking at the 1% and 5% Public Use Microdata Sample (PUMS).

Right; I forgot those exist.

You could generate what you want out of those, if you could get the data working.

BUT

The confidence intervals around the income categories for disabled people are going to be VERY wide. Here's why. Your average census tract has *google* 43300 people, so a 5% sample is about 2200, which is pretty good. You can get nice tight CIs around your point estimates with that.

The thing is, you want to make inferences about the income breakdowns for disabled people and nondisabled people. I don't know the proportion of people with disabilities in PA; let's assume it's 10%. That means that your inferences for the income breakdowns of nondisabled people will be based on a sample of about 2000, which is good. But your inferences about the incomes of disabled people will be based on a sample of about 200, which is not so good. 200 puts wide CIs around your estimates.*

You're very unlikely to find anything available for free that will take in a Census .txt file and translate it into something directly usable.

*It would actually make more sense to think of the problem as "Out of this income range, what proportion of people are disabled?" and this would give you even smaller sample sizes.
posted by ROU_Xenophobe at 12:14 PM on October 30, 2009


You might want to try the IPUMS website. It's got some other data access options that might enable you to get your data in a more useful format. Alternatively, Penn has a population research center, if you'd like some local help.

ROU Xenophobe makes very good points about your inferences, by the way.
posted by McBearclaw at 12:31 PM on October 30, 2009


Response by poster: ROU Xenophobe - I had gotten a sense of the CI issues from the technical documentation (not that I understand it in the way that you stats folk do) but I am not undertaking a scholarly kind of thing -- just trying to get an estimate that is better than a guess. Thank you for pointing it out, though.

McBearclaw - I will go checkout the IPUMS site, thanks. Will also go to Penn, but they often restrict access to their on-line resources (think they'd give a Quaker a break, but nooooo).
posted by qldaddy at 12:57 PM on October 30, 2009


Response by poster: Wanted to get back and thank you all for your answers. Sounds like I will not be able to do what I was hoping to do, so I will have to think of another way to get at an approximation of the answer.

Sorry to have let the thread wither away like that - I was called away from this project by some other work-y stuff.
posted by qldaddy at 11:46 AM on November 13, 2009


Thanks for checking back in, though - that is all too rare and really nice.
posted by McBearclaw at 2:20 PM on November 13, 2009


« Older Menu planning and grocery shopping   |   If it's BS, call it BS Newer »
This thread is closed to new comments.