Advertise here: Contact FM.


StatsFilter: What's the slope? A linear regression problem the textbooks can't (won't) answer!
August 1, 2007 9:23 PM   RSS feed for this thread Subscribe

StatsFilter: What's the slope? A linear regression problem the textbooks can't (won't) answer!

Say I have a linear regression of the form: Y = a + b1 X + b2 X^2.
Let's also say that b1 > 0 and b2 < 0. what is the proper way to test the null hypothesis dy / dx> 0?

NOTE: dY / dX = b1 + (2 * b2 * X). I don't think the simple t-test that the stats package spits out will answer the question correctly.

NOTE: If it helps, the estimated coefficient on b1 is approximately 10 times larger than the estimated coefficient on b2.

No, this isn't a homework assignment--this is real life!

Bonus points for citing a source.
posted by GarageWine to science & nature (11 comments total) 1 user marked this as a favorite
dY / dX = b1 + (2 * b2 * X)

Yeah, but this means that either the slope is negative and then positive, or positive and then negative. Somewhere, it's significantly positive, and somewhere else it's significantly negative, irrespective of what your coefficients are and irrespective of what your standard errors are, since you can run the slope out to ~ + or - infinity if you feel like it.

If you want to test whether the slope of the relationship is positive or negative within your data or in the general neighborhood of your data, why not just run the regression without the polynomial term?

If you're after the more general question of "Does this variable have any effect?" then you probably want to run a joint F-test over b1 and b2.

(NB: my general approach to all this tends to value practicality of inference over methodological purity)
posted by ROU_Xenophobe at 10:14 PM on August 1, 2007


Y = a + b1 X + b2 X2 isn't a linear regression--it's a quadratic equation--so none of the formulae for a linear regression are going to work. How you proceed from here depends on what you want to do.
posted by cardboard at 10:15 PM on August 1, 2007


Specifying it with a quadratic term means that you have reason to think that the rate of change (dy/dx) is itself a function of x. So there's not a single answer. You need to specify values of x and run the t-test for those values. Hopefully you know whatever package you're using well enough to do this (in Stata, for instance, you'd use the lincom command with a multiplier on b2). It will give you a different p-value for each x you put in, which is as it should be. It might be the case that, e.g., x has no effect on y when x = 4, but a large effect when x = 10.

What I'd do in this situation is calculate dy/dx for a few different values of x, and plot the results. You can, with some trial and error, also find the point or points at which you switch from being able to reject the null to being unable, or vice versa.
posted by shadow vector at 10:29 PM on August 1, 2007


Disregard what I said. It's wrong.
posted by cardboard at 10:47 PM on August 1, 2007


This is an authoritative reference on nonlinear regression. A bit technical, though. Reed appears to have a copy of an older edition.

You can get to some of the key pages on Google Book Search, though. It gives a t-statistic in equation 5.3. See equation 2.6 and its predecessors to aid in the interpretation of one of the terms in 5.3.
posted by epugachev at 11:43 PM on August 1, 2007


I should add that you should be careful with this stuff if is for something important, as IIRC doing this properly requires some sophistication. Hopefully an expert will come across this thread.
posted by epugachev at 11:47 PM on August 1, 2007


epugachev, while the relationship with between x and y specified in the question is nonliner, the equation is still linear with respect to the coefficients b1 and b2. The book you linked to covers regression models that are nonlinear in the coefficients.

ROU_Xenophobe is correct that the overall Model F test will give you a test of whether the linear combination of x and x2 shows a significant relationship (that is, the slope is not 0) with y.
posted by naturesgreatestmiracle at 12:01 AM on August 2, 2007


You are asking whether Y varies with X. If Y does not vary with X then the slope is zero. In your model the slope varies with X because of the quadratic term. I guess what you want to know is whether your complex model is better than a simple model where the slope is 0.

There are 2 approaches to getting to grips with this:
1) You can take 2 models, your one (Y = a + b1 X + b2 X^2), and one where the slope is fixed at zero (i.e. where Y does not vary with X). This would simply be a straight line fitted through your data (Y=a) so you are just estimating the intercept.

Then you can check whether the model is improved by addition of your term for X and X^2 in turn. This is normally done by adding the term to the model and checking the change in residual deviance. This will decrease, indicating a better fit, but what you need to know is whether the improvement in the model fit is good enough to warrant the increased complexity of estimating more coefficients (principal of parsimony/occam's razor). Most stats packages allow this comparison by doing a test to compare the 2 models. For example an anova/F-test. If the models are significantly different then the more complex one is the winner.

2) You can examine the Akaike information criterion (AIC) values for your model and compare it to a model where slope is fixed at zero. AIC basically takes the log likelihood and penalises it with the number of coefficients you are estimating. The model with the lowest AIC is the best model. Most stats packages allow the calculation of AIC, or you can calculate from 1st principles.
posted by jonesor at 3:03 AM on August 2, 2007


You want a point estimate of b1 +2*b2*x for some particular value of x? Your coefficients are functions of the data, and hence random variables. Your stats book should tell you about the distribution of regression coefficients (they're normal in the standard setup, and so is their sum). So with a normal variable with variance which comes from your data, you have a point estimate and confidence interval on (b1 + ...). If the confidence interval doesn't overlap 0, you're done.

Personally I'd use a bootstap technique for the confidence interval to make sure it was right. This requires some software or significant extra work.

If you want a confidence band to exclude the possibility that the slope is negative on a continuous range of data, that's also a more complicated topic. You can email me and I'll point you the right way.
posted by a robot made out of meat at 6:12 AM on August 2, 2007


ROU_Xenophobe is correct that the overall Model F test will give you a test of whether the linear combination of x and x2 shows a significant relationship (that is, the slope is not 0) with y.

You can also run F-tests for arbitrary subsets of the model, which is what I was suggesting. Doing so is particularly useful for sets of dummies that are one concept (race/ethnicity, for example). It can also be useful in diagnosing collinearity. The stata postregression command is just "test b1 b2".

Putting together other comments, if you happen to be using stata, the command for the point estimate the fleshy robut describes is "lincom b1+2*xvalue*b2", which will spit back a 95% CI by default.
posted by ROU_Xenophobe at 7:37 AM on August 2, 2007


naturesgreatestmiracle is totally right and I am wrong. Disregard my answers, as I imagine you have done already.
posted by epugachev at 9:36 AM on August 2, 2007


« Older Help me find a insurance compa...   |   Financial Newbiefilter: Should... Newer »
This thread is closed to new comments.