Likert nightmares
November 16, 2010 5:36 AM   Subscribe

Likert scale survey with 3 matrices, each examining a different factor by asking 8 questions to be ranked. I have the 110 responses I wanted, broken down by some demographics. Now what?

What kind of statistical analysis do I run on this categorical/nominal data from the survey? How do I make any relation between any results from the data to my two hypotheses? What coefficient of what type am I looking for and what does it mean?

I've completely blacked out on this, apparently.

Pretend I'm totally retarded and walk me through how you would analyze the results from this survey.
posted by ttyn to Computers & Internet (9 answers total)
This is likely to be way, way, way beyond the scope of what you can get here. We don't know the nature of the data, the nature of the questions, or your hypotheses. The simple answer is to do t-tests for demographic tests of difference when you only have two categories of demographics (e.g., sex) and to do ANOVAs when you have more than two (e.g., ethnicity, race, whatever). You could also do a regression with covariates to predict what characteristics predict responses on certain items. The main thing you are looking for (to put it very simply) is that your tests produce a p-value of less than .05. If you don't know what that means, either, you need to get some outside help on this. Is this for work? School? Who conducted the survey? Can they help you?
posted by proj at 6:10 AM on November 16, 2010

You should probably start by seeing if the questions nominally addressing a factor actually hold together with confirmatory factor analysis, and PCA to see if anything unexpected is showing up.

What kind of statistical analysis do I run on this categorical/nominal data from the survey? How do I make any relation between any results from the data to my two hypotheses? What coefficient of what type am I looking for and what does it mean?

That's going to depend on what those hypotheses are. Your last question indicates that you are currently a student, so perhaps seek some clarification from a grad student or advisor or teaching faculty in your department.
posted by a robot made out of meat at 6:18 AM on November 16, 2010 [1 favorite]

And for the googlebot's sake (I'm sure you know this) if the project is school related and you show up with "I have no analysis plan, can you write one?" the answer will probably be 'no.' If you show up with "I have data like $my_data, and was going to use $method to answer question $hypothesis, is that valid?" having thought a bit and done an hour's work to come up with a plan better than random stata menu commands, then the professor / TA will likely make an effort to improve your analysis plan. Especially if you have specific questions (can I use t-tests on ordinal variables?) I would try hard to make the answer clear.
posted by a robot made out of meat at 7:01 AM on November 16, 2010

Response by poster: It's not for school and it's not for work. I was supposed to help a friend with a data visualization project that turned into more than I can chew off given that said friend just figured she can make a survey (however she pleases) and leave me with the bulk of the work.

The plan was to use multiple regression, until said friend informed me that rather than each question having a continuous sliding scale that the user could select values on she decided it was easier to do a 5-point likert. I've never worked with categorical or ordinal variables as it relates to hypothesis testing. And there you have the reason for my black hole in statistics.

I was going to tell friend to find someone else to torture with her last-minute flashes of brilliance, but the data is kind of interesting and I thought this would be a good learning opportunity for me as well. I've googled up and down chi square tests, but I'm not quite convinced that's the way I ought to go.
posted by ttyn at 7:09 AM on November 16, 2010

Great. Be sure to do the usual survey due-diligence, looking at patterns of missing data and patterns of bogus responses (eg all same answer after a point).

If a single item is the outcome of interest, it may be worthwhile to use ordinal logistic regression instead of regular linear regression (which would amount to assuming that the difference between "3" and "4" is the same as the difference between "4" and "5"). If you want to use one of the factors that the three matrices are getting at as the outcome, then regressing with a composite or a PCA/FA result as outcome would be fine. Breaking out the item response theory toolbox is overkill for a side project, but if it's something you want to learn about then go ahead. If you're just interested in the patterns of correlation in the responses, then (sparse) FA is the way to go.

Chi-squares are fine, but if you're interested in a relationship which may be confounded by demographics (check the summary statistics first) then a regression-based method is good because it allows you to adjust.
posted by a robot made out of meat at 7:26 AM on November 16, 2010

I was bothering my stats professor about a similar question (I'm a grad student in the social sciences) - how to do tests on categorical variables that are more complex than (as far as I can also tell) a chi square test can deal with. His response: "Categorical variables are terrible. This is why psychologists do everything on rating scales." In other words, people often treat rating scale data as an approximation to interval (and therefore ANOVA/t-test/regression-able) data. Probably a statistician is about to show up and clobber me for this suggestion, but for your purposes, this might be the best way to go rather than torturing yourself over technically ordinal data.

Sadly for me, three-year-olds are not very good at 5 point rating scales :(
posted by heyforfour at 7:33 AM on November 16, 2010

Response by poster: heyforfour, I may not know anything about much of anything but I can't -- with a clean conscience -- pretend that "strongly disagree" + "strongly agree" average out to "neither disagree nor agree". I've seen quite a few instances where ordinal data is used as if it were continuous and it drives me insane. Unfortunately, I am still struggling with anything much more complex than that... Sucks to be me, eh?

a robot made out of meat: thanks! I think that's the direction I need to head in. This will either help me or make my funtime google goose chase more entertaining for the rest of the day.
posted by ttyn at 7:46 AM on November 16, 2010

The regression technique you're looking for is either multinomial logistic regression or ordered probit (this is pretty much the standard text for that). The output from these analyses is pretty lengthy and the interpretation is never straightforward as they are always in terms of a referent answer and in terms of probability rather than predicted scores (i.e., males are 1.2 times as likely to answer B than A but 1.5 times more likely to answer C than A, etc.) You're in over your head here and I recommend you step away. If this is for a school project for your friend, she needs to get instructor or TA help. If it's just for shits and giggles, have fun, but if you're serious about interpreting these data in any way, I'd recommend some outside help.
posted by proj at 8:06 AM on November 16, 2010

Mostly just repeating stuff others have said.

If you're looking at the factor outputs, not the original scales, you can just do OLS on them.

If you're looking at an original ordinal scale, you can do ordered logit or ordered probit. The most basic interpretation of sign and significance is the same as with OLS; it only gets hard once you want to make substantive interpretations.

You would only want to use the multinomial logit proj suggests for *unordered*, purely categorical dependent variables. Interpretation of MNL is indeed a pain, and you cannot even make basic sign-and-significance inferences from the raw output.

For ordinal data that's not for anything really critical, you also have the option of just saying "Fuck it" and doing OLS anyway. It's not likely to return strongly different results than the more proper ordered logit/probit. If you need a bullshit excuse for using it, it was to make interpretation simpler for the reader/user.
posted by ROU_Xenophobe at 9:25 AM on November 16, 2010

« Older Bad Sector? Huh?   |   I have tottaly not forgotten about this until the... Newer »
This thread is closed to new comments.