Help me analyze data from a survey
June 1, 2006 11:29 AM   Subscribe

I conducted an anonymous survey of all the grad students in my department and had them tell me a bunch of info. How would I analyze this data?

I conducted an anonymous survey of all the grad students in my department and had them tell me basic stuff [like years in school, status(single/committed),type of car, sex, where their research was (in CA, outside CA, outside USA) etc. ] and also had them check off things that apply to their research from a long list (duct tape, ipod, poison oak,snow,sun screen etc.)

So how would I go about analyzing this? The second part (where I had then check stuff) is all binary. Can I do a cluster analysis with binary data?

Is there a set of things that I can do with SAS, R or JMP (preferred)?

Are there other tools available online that would help?
posted by special-k to Computers & Internet (13 answers total)
 
I don't mean to be snarky, but did you have specicific questions in mind to answer when you collected this data? It's a lot easier to determine the best way to do analysis when you have some sense of what you're trying to analyze.
posted by j-dawg at 11:39 AM on June 1, 2006


j-dawg: I was hoping to see if there were any trends in the data. The list of items I had people check off was to see if certain groups of people (say for example people > 28 yrs old & single) would cluster together.
The question here is simply how would one analyze data from a survey. I bet this is done a lot in the social sciences and I was hoping someone would give me advice on how to go about it.
posted by special-k at 11:43 AM on June 1, 2006


You didn't say exactly what/where you are in school, but your best bet is to go to your marketing or psychology department and ask someone to help you with SPSS. That's the most commonly-used program for analyzing data. You'd probably be best with a cross-tab analysis.
posted by radioamy at 11:49 AM on June 1, 2006


The usual procedure for statistical testing involves:

1. Setting up a null hypothesis H0 (e.g.: "Grad students have as much sex as undergraduates")

2. Setting up an alternative hypotheses HA (e.g.: "Grad students have more/less sex than undergrads")

3. Choosing a statistical test that can compare the probability of H0 with the alternatives, e.g. MANOVA

4. Collecting data

5. Running the statistical test

You have done step 4 and want to do step 5 before having done steps 1, 2 and 3. This will not only call your results into question but is the wrong way to analyse your data.
posted by Mr. Six at 11:49 AM on June 1, 2006


Not sure what you want. Do you want instructions on doing cluster analysis (harder) or a piece of software that will do it for you (easier)? If so, the standard software is SPSS, but I'm no great fan of it.

Wikipedia has a list of statistical packages which includes some free alternatives.

Say more :)
posted by Hildago at 11:54 AM on June 1, 2006


yup. crosstabs are what you need. spss can handle them, so can sas, or any decent reporting tool off of a database package.

but mr. six is right: you did this backwards--and so your project will be hopelessly muddled. that's ok for some school project, but in real life, you would be fired.
posted by lester at 12:03 PM on June 1, 2006


radio-amy: Thanks. You've had the most helpful answer so far. I'll look up how to do a cross-tab analysis.

Hidalgo: I could use either/both. I mention R/SAS/JMP because I have access to those software. I could track down SPSS if you (or anyone else) can tell me what tools it has that would help me.
posted by special-k at 12:12 PM on June 1, 2006


Cluster analysis works well on binary data. It readily lends itself to a taxonomic approach. You can do a heirarchical cluster analysis in either SAS or R (I'm not familiar with JMP).
posted by bonehead at 12:18 PM on June 1, 2006


You're doing data-mining, and should be using data-mining tools. These will probably be more marketing tools than social science tools.

The question here is simply how would one analyze data from a survey. I bet this is done a lot in the social sciences and I was hoping someone would give me advice on how to go about it.

Survey research is indeed done a lot in social science. But the first step in social science survey research is "Develop a theory." You have no theory, you just want to see what relates to what. With no theory, you're going to turn up a lot of spurious relationships.

You could analyze individual binary responses using logit or probit. Or you could analyze sets of responses with multinomial / polychotomous logit. But really, these (or other regression-style tools) are better used when you have a theory to test. Throwing everything and the kitchen sink in isn't really analysis.

If you're doing anything remotely serious with this, I hope you remembered to get a human-subjects approval or waiver.
posted by ROU_Xenophobe at 1:02 PM on June 1, 2006


ou have no theory, you just want to see what relates to what. With no theory, you're going to turn up a lot of spurious relationships.

Sadly thats what im looking for. This is anything but serious (I would never do this with my own research!). This is meant to be an spoof article in our grad newsletter (so no human-subject approvals etc). But I still need to analyze the responses and/or make up (funny) results.
posted by special-k at 1:50 PM on June 1, 2006


As the above said, you'll be doing crosstab analysis, primarily to generate chi-squares.

You'll probably want to convert variables that are continuous or have a lot of values to something more discrete. For example, years in school should be converted to something like 1, 2, 3, 4, 4-6, 6-10, more than 10. Type of car you may want to break down by new car value ($10-15,000, $15-$20,000, etc) or size (hatch, coupe, sedan, van), depending on what you want to study. This gives you categorical values to use in the crosstab.

I learned stats informally so I apologize for any errors in terminology.

This should be easily accomplished in SAS or R - no need to hunt down SPSS. SPSS is just a bit easier to learn than SAS starting out cuz it has a point-and-click interface.
posted by junesix at 1:50 PM on June 1, 2006


This is meant to be an spoof article in our grad newsletter

In that case, I put it to you that you have several partly-contradictory goals:

(1) Uncover the most stupid, obviously spurious results possible. You can do this by using a regression-based method to analyze your variables, with everything and the kitchen sink thrown in as IVs.

(2) Get the analysis spectacularly and obviously wrong. You *should* analyze binary variables with logit or probit, so use OLS as a linear-probability model instead. With luck, this will tell you that people who drive Hondas have a 153% probability of using duct tape, or some other logically impossible result that you should describe in great detail.

(3) Use the most absurdly complex methods possible. Throw in interactive effects that make no sense. Why analyze the binary variables one at a time with logit when you could analyze them all at once with multinomial logit and get *really* incomprehensible results?

You can do all this in SAS, R, Stata, SPSS.
posted by ROU_Xenophobe at 2:18 PM on June 1, 2006


it's not serious? well, that makes it easy: make it all up. your imagination will be a lot more interesting and amusing if you take creative license with the data. to try to generate 'real' results from data that you will ultimately not use for any study is distracting you from your real mission: writing up the results that amuse or entertain.
posted by lester at 2:21 PM on June 1, 2006


« Older PrinterFilter: Help me find out if I can make this...   |   waiting for N Newer »
This thread is closed to new comments.