Weighted regression?
November 30, 2011 8:05 AM   Subscribe

Excel statistics question (linear regression with weighted data points)

I have a set of data that I'd like to fit a regression line to. However, some of the data points should have more weight to them. Is there any way to handle this in excel using the SLOPE() and INTERCEPT() functions, or some other way to tackle this?

Sample data here:
Known x's: 10 20 30 40
Known y's: 5 12 25 50
Importance: 50 100 10 1

So how would I fit a line that puts most emphasis on the (20,12) point and least emphasis on the (40,50) point?

I will actually be fitting a curve to the data using ln(y) if that further complicates things.

Thanks!
posted by dabug to Computers & Internet (9 answers total) 2 users marked this as a favorite
 
Best answer: The hacky way to do it would be to create 50 identical points at (10,5), 100 at (20,12), etc. Excel isn't great for stats.
posted by supercres at 8:10 AM on November 30, 2011


Unless someone else is forcing you to do this in Excel, I would move elsewhere. In R, this would be a simple matter of

outputobject<>
In Stata, you'd just specify an aweight=importance.
posted by ROU_Xenophobe at 8:36 AM on November 30, 2011


Best answer: You can do matrix multiplication in Excel, so you can do weighted regression. If X is the matrix of X values, W is a diagonal matrix with the weights, and Y is a column of ln(y), the formula would be something like:

MMULT(MINVERSE(MMULT(TRANSPOSE(X),W*X)),MMULT(TRANSPOSE(X),W*Y))
posted by mikeand1 at 8:49 AM on November 30, 2011


Yeah, you could use MMULT for this, but be aware that it is an array formula. More info here.
posted by dfriedman at 9:26 AM on November 30, 2011


I wouldn't do the normal-equations approach above. It is unstable, which is why stat software doesn't use it; even excel since 2003 uses QR. The knowledge base has an example of how you would do that if you were crazy and wanted to do QR decompositions by hand.
posted by a robot made out of meat at 10:29 AM on November 30, 2011


Since ROU's answer got html-bashed, in R after importing the data into vectors y and x
my.output<-lm(log(y)~x,weights=myweights)
summary(my.output)
where myweights is a variable with the numeric weights you want.
posted by a robot made out of meat at 10:32 AM on November 30, 2011


"I wouldn't do the normal-equations approach above. It is unstable..."


That's only an issue if there's a high degree of collinearity in the columns of X. Otherwise, it's not a problem.
posted by mikeand1 at 10:36 AM on November 30, 2011


Thanks robot!
posted by ROU_Xenophobe at 10:46 AM on November 30, 2011


Response by poster: Thanks all.
posted by dabug at 10:52 AM on November 30, 2011


« Older Affordable, high-quality HEPA vacuums?   |   Any recommendations for a 3rd party .pst repair... Newer »
This thread is closed to new comments.