# Statistics for approval correlating demographic variable?

November 25, 2013 7:53 PM Subscribe

How do I use SPSS to analyze a range of approval ratings which vary by participant and correlate the skew to one demographic variable?

I am trying to analyze the data for an experiment. I am getting really frustrated, and I keep reading papers trying to figure out what they did, and I can't.

Here's my set-up:

As variable x(vowel backness)increases, approval should go down. Thing is, people are interpreting the scale in different ways, some aren't using the extremes, some are. I need to normalize this. Then, as variable y (use of English)increases, the tolerance for variable x should increase. I need to make this skew of approval ratings into one number per participant, and then correlate it to variable y. Then, I have to put this on a graph of some kind. I'd imagine I would need some sort of 3-D graph.

I am using SPSS. What statistical tests do I need to do to accomplish my aims?

I am trying to analyze the data for an experiment. I am getting really frustrated, and I keep reading papers trying to figure out what they did, and I can't.

Here's my set-up:

As variable x(vowel backness)increases, approval should go down. Thing is, people are interpreting the scale in different ways, some aren't using the extremes, some are. I need to normalize this. Then, as variable y (use of English)increases, the tolerance for variable x should increase. I need to make this skew of approval ratings into one number per participant, and then correlate it to variable y. Then, I have to put this on a graph of some kind. I'd imagine I would need some sort of 3-D graph.

I am using SPSS. What statistical tests do I need to do to accomplish my aims?

Linear regression would probably be a good place to start. A couple of things you'd want to consider if going down that path:

1) how many observations do you have per participant for each experiment?

2) what kinds of scales (ordinal, categorical...) are your variables on? How are they distributed? Would you expect there to be a linear relationship between vowel backness and approval (ok if not, in which case you just have to tweak things a bit)

3) what do you mean by people interpreting the scale in different ways? Same thing with "tolerance for variable x" -- does this refer to something you're measuring or as cupcake1337 put it, some statistical property of x that you expect it to have?

If you want, me-mail me a description of what you're trying to do and whatever other materials you might have, I (a statistician) can take a look over it in the next couple of days and let you know my thoughts.

posted by un petit cadeau at 8:26 PM on November 25, 2013

1) how many observations do you have per participant for each experiment?

2) what kinds of scales (ordinal, categorical...) are your variables on? How are they distributed? Would you expect there to be a linear relationship between vowel backness and approval (ok if not, in which case you just have to tweak things a bit)

3) what do you mean by people interpreting the scale in different ways? Same thing with "tolerance for variable x" -- does this refer to something you're measuring or as cupcake1337 put it, some statistical property of x that you expect it to have?

If you want, me-mail me a description of what you're trying to do and whatever other materials you might have, I (a statistician) can take a look over it in the next couple of days and let you know my thoughts.

posted by un petit cadeau at 8:26 PM on November 25, 2013

I'm going to approach this as if you are totally new to statistics, apologies if that's wrong. I don't mean to condescend, I just want to make sure that this makes sense regardless of what your knowledge level is.

I'm not clear on exactly what your data looks like and I don't know anything about SPSS (I'm an R guy). SPSS is a capable package though and should be able to do what you need if you can figure out how to make it do it. That said, here are your main options:

It's also very important that you research the assumptions of the analyses you are running and make sure that your dataset satisfies them, in order to make sure that your analysis is going to be valid. A few important ones off the top of my head are sufficient sample size, normally distributed response data, and linearity of relationships. There are other important ones though as well, and they vary based on your choice of analysis.

If your dataset doesn't satisfy the assumptions then you have the options of either throwing out data, attempting to transform the data to make it fit the assumptions, or choosing a more advanced analysis with fewer assumptions. These strategies should not be employed blindly as applying them improperly can certainly bias your analysis or reduce its statistical power in serious ways. They need to be employed judiciously and appropriately, in a logically-defensible manner.

You also need to make sure that the analyses themselves are being applied appropriately. For instance if you are looking at the effect of a single continuous predictor on a variety of continuous response variables you

This stuff gets complicated and varies a lot from dataset to dataset and question to question, and you might be better off consulting a mentor and/or a textbook rather than asking here. It's unlikely that we will be able to give you complete and accurate advice here, based on the amount of information we have to go on. Good luck, though!

posted by Scientist at 12:06 PM on November 26, 2013 [1 favorite]

I'm not clear on exactly what your data looks like and I don't know anything about SPSS (I'm an R guy). SPSS is a capable package though and should be able to do what you need if you can figure out how to make it do it. That said, here are your main options:

- If your predictor variable (the demographic one) is categorical (e.g. ethnicity, income bracket, etc) and you have a single continuous response variable (approval ratings, which are probably continuous since I assume they are percentages that can fall at any point along a continuous range from 0 to 100) then the analysis you want to run is probably an ANOVA.
- If your predictor variable is continuous (e.g. exact income, age) and you have a single continuous response variable, then you probably want to run a linear regression.
- If you have
*multiple*predictor variables and/or response variables then you probably want a MANOVA or multiple linear regression, depending on whether the predictors are categorical or continuous. - If your dataset includes categorical response variables or a mixture of categorical and continuous predictor variables, things get more complicated and that's probably outside the scope of this answer, although sometimes it's possible to reinterpret categorical data as continuous in a statistically valid way.

It's also very important that you research the assumptions of the analyses you are running and make sure that your dataset satisfies them, in order to make sure that your analysis is going to be valid. A few important ones off the top of my head are sufficient sample size, normally distributed response data, and linearity of relationships. There are other important ones though as well, and they vary based on your choice of analysis.

If your dataset doesn't satisfy the assumptions then you have the options of either throwing out data, attempting to transform the data to make it fit the assumptions, or choosing a more advanced analysis with fewer assumptions. These strategies should not be employed blindly as applying them improperly can certainly bias your analysis or reduce its statistical power in serious ways. They need to be employed judiciously and appropriately, in a logically-defensible manner.

You also need to make sure that the analyses themselves are being applied appropriately. For instance if you are looking at the effect of a single continuous predictor on a variety of continuous response variables you

*could*just run a series of linear regressions for each response variable, but this would greatly inflate your Type I error rate (chance of false positives). You could then choose to correct for this in one of several ways, or (depending on the kind of resolution you need in your results) you could do a multiple regression instead.This stuff gets complicated and varies a lot from dataset to dataset and question to question, and you might be better off consulting a mentor and/or a textbook rather than asking here. It's unlikely that we will be able to give you complete and accurate advice here, based on the amount of information we have to go on. Good luck, though!

posted by Scientist at 12:06 PM on November 26, 2013 [1 favorite]

This thread is closed to new comments.

for your first question, i think you can use an OLD (ordinary least-squares) regression. there are many ways to re-categorize your independent variable, the

bestdepends on your interpretation of the data.it's not clear what you mean by "tolerance." do you mean that variance of the error term should increase? how do you set tolerance? or, do you mean, controlling for y, you expect x to not be significant?

posted by cupcake1337 at 8:02 PM on November 25, 2013