Weighted regression?
November 30, 2011 8:05 AM
Excel statistics question (linear regression with weighted data points)
I have a set of data that I'd like to fit a regression line to. However, some of the data points should have more weight to them. Is there any way to handle this in excel using the SLOPE() and INTERCEPT() functions, or some other way to tackle this?
Sample data here:
Known x's: 10 20 30 40
Known y's: 5 12 25 50
Importance: 50 100 10 1
So how would I fit a line that puts most emphasis on the (20,12) point and least emphasis on the (40,50) point?
I will actually be fitting a curve to the data using ln(y) if that further complicates things.
Thanks!
I have a set of data that I'd like to fit a regression line to. However, some of the data points should have more weight to them. Is there any way to handle this in excel using the SLOPE() and INTERCEPT() functions, or some other way to tackle this?
Sample data here:
Known x's: 10 20 30 40
Known y's: 5 12 25 50
Importance: 50 100 10 1
So how would I fit a line that puts most emphasis on the (20,12) point and least emphasis on the (40,50) point?
I will actually be fitting a curve to the data using ln(y) if that further complicates things.
Thanks!
Unless someone else is forcing you to do this in Excel, I would move elsewhere. In R, this would be a simple matter of
outputobject<>
In Stata, you'd just specify an aweight=importance.>
posted by ROU_Xenophobe at 8:36 AM on November 30, 2011
outputobject<>
In Stata, you'd just specify an aweight=importance.>
posted by ROU_Xenophobe at 8:36 AM on November 30, 2011
You can do matrix multiplication in Excel, so you can do weighted regression. If X is the matrix of X values, W is a diagonal matrix with the weights, and Y is a column of ln(y), the formula would be something like:
MMULT(MINVERSE(MMULT(TRANSPOSE(X),W*X)),MMULT(TRANSPOSE(X),W*Y))
posted by mikeand1 at 8:49 AM on November 30, 2011
MMULT(MINVERSE(MMULT(TRANSPOSE(X),W*X)),MMULT(TRANSPOSE(X),W*Y))
posted by mikeand1 at 8:49 AM on November 30, 2011
Yeah, you could use MMULT for this, but be aware that it is an array formula. More info here.
posted by dfriedman at 9:26 AM on November 30, 2011
posted by dfriedman at 9:26 AM on November 30, 2011
I wouldn't do the normal-equations approach above. It is unstable, which is why stat software doesn't use it; even excel since 2003 uses QR. The knowledge base has an example of how you would do that if you were crazy and wanted to do QR decompositions by hand.
posted by a robot made out of meat at 10:29 AM on November 30, 2011
posted by a robot made out of meat at 10:29 AM on November 30, 2011
Since ROU's answer got html-bashed, in R after importing the data into vectors y and x
my.output<-lm(log(y)~x,weights=myweights)
summary(my.output)
where myweights is a variable with the numeric weights you want.
posted by a robot made out of meat at 10:32 AM on November 30, 2011
my.output<-lm(log(y)~x,weights=myweights)
summary(my.output)
where myweights is a variable with the numeric weights you want.
posted by a robot made out of meat at 10:32 AM on November 30, 2011
"I wouldn't do the normal-equations approach above. It is unstable..."
That's only an issue if there's a high degree of collinearity in the columns of X. Otherwise, it's not a problem.
posted by mikeand1 at 10:36 AM on November 30, 2011
That's only an issue if there's a high degree of collinearity in the columns of X. Otherwise, it's not a problem.
posted by mikeand1 at 10:36 AM on November 30, 2011
Thanks robot!
posted by ROU_Xenophobe at 10:46 AM on November 30, 2011
posted by ROU_Xenophobe at 10:46 AM on November 30, 2011
« Older Affordable, high-quality HEPA vacuums? | Any recommendations for a 3rd party .pst repair... Newer »
This thread is closed to new comments.
posted by supercres at 8:10 AM on November 30, 2011