Help me with SPSS
July 27, 2005 6:06 AM   Subscribe

I have an SPSS question about linear vs. curvilinear models

I have data and do not know how to determine whether it is linear or curvilinear. I did a quick Google search that told me to plot the residuals and look at "normality." However, I don't know if this is all I need to do, and even if it is, I don't know how to graphically interpret the residual plot to determine if it is linear or curvilinear.

The plot I have looks like this:


Any help is much appreciated.
posted by thewittyname to Science & Nature (2 answers total)
 
Plotting the residuals will show you their distribution. A normal distribution would mean that they're occuring in a way that fits the linear model (some way out on either extreme, but most are pretty close to the line). What you need is a Q-Q plot. That analyzes the difference, for each residual in this case, between its actual value on one axis, and its expected value (if normal) on the other. So, if they're perfectly normal, you'll get a 45deg. angle ((1,1), (2,2), etc.). If you don't see that, they're not normally distributed.

I'm not a statistician, in fact I just finished a college stats course and might be forgetting something important. But that's how you determine normality in SPSS, at any rate.
posted by electric_counterpoint at 6:49 AM on July 27, 2005


So your model has just one explanatory / predictive / independent variable? Plus the constant?

The first thing I'd do in that case is just plot the dependent variable versus the independent.

You can also do useful diagnostics by plotting residuals. I don't know the specifics of how you do this in SPSS -- I use Stata and R -- but in principle it's easy to do. Then just plot the residual versus the independent variable. You can do this with qq-plots too, but I find looking at the raw residuals more intuitive.

What you want to see is random crap. A bunch of points with no discernible order, on both sides of zero.

If the residuals show a pattern, you can have a problem. There are four classic problems you can spot with residuals.

First, the residuals might be hill-shaped. This is nonlinearity, and it means that your independent variable has a diminishing marginal effect -- that your first LADDER has a bigger effect than your tenth LADDER. This is easy to deal with as a first stab by creating a new variable that's LADDER squared, but it does make interpreting your results trickier.

Second, the residuals might be trumpet-shaped. This means you have heteroskedasticity, and need to go talk to a prof about it (assuming you're in school). There are corrections, but... ew.

Third, they might be sort of L-shaped, or look like a combination of U-shaped and trumpet-shaped. This might mean that you need to look at log-linear or log-log relationships instead of linear. This can happen in growth processes. It can also happen where the relationship between variables is in percentages instead of numbers -- it might be that a one-unit increase in LADDER causes a 3% increase in AGE, or it might be that a 1% increase in LADDER causes a 3% increase in AGE.

The fourth can happen when it makes sense to think about time or space. Then, if you plot the residuals versus time or space, you want to see random shit. If you see a pattern that looks like a pathway -- if a high residual tends to be followed by another high one -- then you have serial correlation and need to go do time series stuff.
posted by ROU_Xenophobe at 7:20 AM on July 27, 2005


« Older Help my cat stop being such a pussy   |   Fruity booze! Tell me cocktail recipes Newer »
This thread is closed to new comments.