Walk me through a multivariate regression
March 18, 2012 7:36 AM Subscribe
Stats filter. I am doing multivariate regression for the first time and I want to understand what I'm doing, having gone beyond my formal training. I have many possible ways I could formulate the regression (different variables to include) and I want to find a model that is both the best possible fit while using the fewest number of variables. How?
posted by PercussivePaul to Science & Nature (12 answers total) 5 users marked this as a favorite
I have what seems like a straightforward regression set up. A given object i has inputs x_1 to x_n with coefficients A_1 to A_n, and output y_i. (I think this is multivariate regression, rather than multiple regression, because there are multiple outputs. Correct me if I'm wrong.)
In the first formulation of this problem the output is almost a direct linear combination some of the inputs (x), and I can use average coefficients which have some empirical basis. The problem I want to solve is that the inputs x_i have to be measured empirically, and some of these are much easier to collect than others -- also some of them are related to others, they are not entirely independent, and in fact some of them may be almost fully correlated -- and I want to show a good way to approximate y_i using only a few of the inputs. In particular, I want to determine analytically the best such model out of several possible models.
The answer has something to do with correlation coefficients and residuals and r-squared. I've been playing around in Matlab and am getting somewhere, but I don't have a high-level procedure in mind - just fiddling around and not converging on an answer. Can you walk me through the procedure you would use to test different models, identify which variables matter and which ones can be thrown out, and demonstrate that you have arrived at a good answer?