Regression question....
February 22, 2010 9:21 PM Subscribe
How do I estimate the coefficients of this curve using linear regression?
I have a relationship between two variables that is approximated using y = Ax^b. How do I test this is valid using linear regression?
I've confused myself thoroughly about this, unfortunately. For previous relationships that were exponential (y = A exp(bx), I was able to take the natural log of both sides, and then use the slope and intercept from the resulting regression to estimate the parameters. For this relationship, I presume I need to take the log of both sides, but I can't figure out how to estimate what the log base is.
Thank you in advance - I was sort of tempted to ask this anonymously as I'm sure I'm missing something obvious here but I'm drawing a blank
I have a relationship between two variables that is approximated using y = Ax^b. How do I test this is valid using linear regression?
I've confused myself thoroughly about this, unfortunately. For previous relationships that were exponential (y = A exp(bx), I was able to take the natural log of both sides, and then use the slope and intercept from the resulting regression to estimate the parameters. For this relationship, I presume I need to take the log of both sides, but I can't figure out how to estimate what the log base is.
Thank you in advance - I was sort of tempted to ask this anonymously as I'm sure I'm missing something obvious here but I'm drawing a blank
Best answer: If you have access to R, the
posted by Blazecock Pileon at 10:10 PM on February 22, 2010
lm
(linear model), coef
and summary
functions will give you what you need:> x <- c(0,1,2,3,4,5)
> y <- c(2,6,10,14,18,22)
> linearRegression <- lm(y ~ x)
> coef(linearRegression)
(Intercept) x
2 4
> summary(linearRegression)
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5 6
-3.343e-16 3.201e-15 -2.578e-15 -1.800e-15 1.997e-16 1.311e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.000e+00 1.697e-15 1.178e+15 <2>
x 4.000e+00 5.606e-16 7.135e+15 <2>
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.345e-15 on 4 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 5.09e+31 on 1 and 4 DF, p-value: <>
2>2>>posted by Blazecock Pileon at 10:10 PM on February 22, 2010
Response by poster: Thank you rancidchickn and Blazecock Pileon - I do have access to R, and have figured out what was confusing me and how to right it. I was getting confused with what the various parts of the transformed equation, and had spent quite a while trying to sort it out.
posted by a womble is an active kind of sloth at 10:23 PM on February 22, 2010
posted by a womble is an active kind of sloth at 10:23 PM on February 22, 2010
Box-Cox might get you the transformation you need, if I understand the question correctly.
The Minitab Box-Cox is friendlier than R, if you have access.
posted by degrees_of_freedom at 5:54 AM on February 23, 2010
The Minitab Box-Cox is friendlier than R, if you have access.
posted by degrees_of_freedom at 5:54 AM on February 23, 2010
In R, you can specify some transformations in the formula. Such as, lm(log(y) ~ log(x), data = mydata) where mydata is a dataframe with your x and y values in it.
Another question is whether you should transform the data, or use a link function (a generalized linear model). If the error or noise is multiplicative (things that make y grow faster or slower), use a transformation. If the noise is additive (many kinds of measurement error are), then you should use a log link. A question that you can ask to determine which scenario you're in is "could I observe negative values of y?". If so, you at least have an additive component. You'd fit that with
glm(y~log(x), family=quasi(link=log,variance="constant"), data=mydata).
You can get more information about your choices of link and variance functions with ?family.
posted by a robot made out of meat at 7:49 AM on February 23, 2010 [1 favorite]
Another question is whether you should transform the data, or use a link function (a generalized linear model). If the error or noise is multiplicative (things that make y grow faster or slower), use a transformation. If the noise is additive (many kinds of measurement error are), then you should use a log link. A question that you can ask to determine which scenario you're in is "could I observe negative values of y?". If so, you at least have an additive component. You'd fit that with
glm(y~log(x), family=quasi(link=log,variance="constant"), data=mydata).
You can get more information about your choices of link and variance functions with ?family.
posted by a robot made out of meat at 7:49 AM on February 23, 2010 [1 favorite]
Response by poster: I was interested to see that so many people favorited this question. I thought I would share a useful pdf I found that answered what I was trying to do as well.
Hope it is of assistance, and I did really appreciate the answers. I was working late on trying to get some work finished and and tied myself in a knot.
posted by a womble is an active kind of sloth at 9:36 AM on February 27, 2010 [1 favorite]
Hope it is of assistance, and I did really appreciate the answers. I was working late on trying to get some work finished and and tied myself in a knot.
posted by a womble is an active kind of sloth at 9:36 AM on February 27, 2010 [1 favorite]
This thread is closed to new comments.
ln(y) = ln(A) + bln(x)
If you plot this with both x and y being on a logarithmic scale (log-log), it should yield a straight line, where b is the slope and ln(A) is the intercept.
posted by rancidchickn at 10:02 PM on February 22, 2010