# Based on my eyes and a hunch I conclude these two variables are statistically different?

September 13, 2011 10:37 AM Subscribe

How can I test if one predictor variable in a regression is significantly better than another predictor variable?
i.e. If I regress X against A I get an r^2 of .99 and when I regress X against B I get .98. I need a test to see whether this difference is statistically significant.

In case extra details are import:

X is the daily returns of a mutual fund, and A and B are the daily returns of two market indices. I'd like to know if I can support the claim that X tracks A more closely than it track B. (I would guess that the difference is not significant, but I need a formal test).

I have all the data/software to run alternative variations on the models if that is needed.

In case extra details are import:

X is the daily returns of a mutual fund, and A and B are the daily returns of two market indices. I'd like to know if I can support the claim that X tracks A more closely than it track B. (I would guess that the difference is not significant, but I need a formal test).

I have all the data/software to run alternative variations on the models if that is needed.

Same answer. Comparing non-nested hypotheses is always tricky; I know of the j-test and modifications. I don't know that anyone's thought of time-series type applications.

If this is a

posted by a robot made out of meat at 11:15 AM on September 13, 2011

If this is a

*practical*application, I'd probably obtain confidence intervals and compare them. Note that if the two overlap, you have to go through some additional hoops to obtain a p-value, but if they don't overlap then the coverage of the confidence intervals is an upper bound on the p-value (ie, non-overlapping 95% intervals guarantee p<.05).posted by a robot made out of meat at 11:15 AM on September 13, 2011

I am possibly being naive here, but if you're just comparing predictor variables why can't you put them both in the same model and compare standardized coefficients? There is a standard test for comparing the coefficients of two variables in the same linear model and seeing if they differ significantly as well (STATA will do an Ftest of this using the Test command). You will only get partial correlations for each but that should be OK, right?

posted by zipadee at 11:34 AM on September 13, 2011

posted by zipadee at 11:34 AM on September 13, 2011

A usual approach to estimate the distribution of a statistics with an complicated or unknown distribution is bootstrapping. I expect it could be useful here.

posted by zxcv at 11:35 AM on September 13, 2011

posted by zxcv at 11:35 AM on September 13, 2011

Can you put them all into a structural equation model?

posted by k8t at 12:11 PM on September 13, 2011

posted by k8t at 12:11 PM on September 13, 2011

*I am possibly being naive here, but if you're just comparing predictor variables why can't you put them both in the same model and compare standardized coefficients?*

Collinearity.

Also, even if collinearity weren't a problem, a bigger (standardized) coefficient doesn't mean it's a better predictor.

posted by ROU_Xenophobe at 12:46 PM on September 13, 2011

Another test that might get you what you want to look up is AIC (Akaike information criterion) which is a model comparison method. It will tell you which of the models, from the model set, best fits the data. Most statistic programs can easily calculate the values and it's great because it doesn't give you a 'yes/no' answer - instead it'll give you 'none of these top models are all that better'.

posted by hydrobatidae at 12:48 PM on September 13, 2011

posted by hydrobatidae at 12:48 PM on September 13, 2011

zipadee: think causally. If A and B share causal influences, then (covariance(X,A) | B) averaged over observed levels of B isn't answering the right question. Also, the standardized coefficient is a function of both the effect and its uncertainty; you want to separate those.

The problem with the various info criteria is that they have no interpretation in terms of magnitude, just sign. What's a big difference of AIC? What's the probability of getting that by chance? With bayes-factors and p-values there is at least a standard of communication and some kind of probabilistic interpretation. Also, since the models have the same number of parameters, they should all come down to just the difference of log-likelihoods.

I promise I am not ROU's sockpuppet

posted by a robot made out of meat at 12:55 PM on September 13, 2011

The problem with the various info criteria is that they have no interpretation in terms of magnitude, just sign. What's a big difference of AIC? What's the probability of getting that by chance? With bayes-factors and p-values there is at least a standard of communication and some kind of probabilistic interpretation. Also, since the models have the same number of parameters, they should all come down to just the difference of log-likelihoods.

I promise I am not ROU's sockpuppet

posted by a robot made out of meat at 12:55 PM on September 13, 2011

Cut the last 10% of your sample and forecast the cut bit using each of your two variables. Then compare standard errors, and a t-test to see if those two are significant.

Otherwise, any difference in your Durbin Watson?

posted by eyeofthetiger at 1:05 PM on September 13, 2011

Otherwise, any difference in your Durbin Watson?

posted by eyeofthetiger at 1:05 PM on September 13, 2011

eyeofthetiger: what's a big difference in cross-validation loss? What kind of differences in cross-validation loss occur by chance? I'm aware of some papers examining that, but it's complicated and I think the ones I know of are all standard linear models.

posted by a robot made out of meat at 1:13 PM on September 13, 2011

posted by a robot made out of meat at 1:13 PM on September 13, 2011

Thanks everyone for your help so far. Getting the correct name for the problem (non-nested model selection) should help a lot in pointing me in the right direction.

I'm still hoping to find a generally accepted test that I can use to show that neither index is a substantially better fit than the other. It is definitely the case that A and B are extremely correlated.

One thought I had was if I could determine the distribution of the R-squared predictors, then I'd be able to do a T-test comparison of them.

posted by vegetableagony at 2:15 PM on September 13, 2011

I'm still hoping to find a generally accepted test that I can use to show that neither index is a substantially better fit than the other. It is definitely the case that A and B are extremely correlated.

One thought I had was if I could determine the distribution of the R-squared predictors, then I'd be able to do a T-test comparison of them.

posted by vegetableagony at 2:15 PM on September 13, 2011

If I understand your question correctly, you would like to know whether A or B predict Y better. Lots of people who try to answer this question will fit the model

Instead of trying to compare b1 and b2 directly, we're going to compare how well two models fit the data.

For the first model, fit:

The reason why this works is that we're essentially comparing a model where the coefficients for A and B are constrained to be equal to one another (that's what happens by only modeling the sum in the first model) and a second model where the coefficients are allowed to vary. If the coefficients are, in fact, equal, the second model won't provide a better fit for the data. This technique works regardless of how correlated the data may be.

Also: yay for nerdy stats questions on metafilter!

posted by eisenkr at 2:45 PM on September 13, 2011

Y = b0 + b1*A + b2*Band then conduct some post-hoc test to compare b1 and b2. There isn't a great test to perform this, especially since A and B may be correlated with each other. You could probably use bootstraps or simulations to test the size of coefficients, but there is a much easier way, especially if A, B, and Y are highly correlated.

Instead of trying to compare b1 and b2 directly, we're going to compare how well two models fit the data.

For the first model, fit:

Y = b0 + b1 * (A + B) (with A + B as a single variable)For the second model, fit:

Y = b0 + b1 * (A + B) + b2 * BOnce you've fit both models, you can directly compare how well the two models predict Y---use a chi-squared test--because Model 1 and Model 2 are nested models. If Model 2 is a significantly better fit than Model 1, the two coefficients predict Y differently.

The reason why this works is that we're essentially comparing a model where the coefficients for A and B are constrained to be equal to one another (that's what happens by only modeling the sum in the first model) and a second model where the coefficients are allowed to vary. If the coefficients are, in fact, equal, the second model won't provide a better fit for the data. This technique works regardless of how correlated the data may be.

Also: yay for nerdy stats questions on metafilter!

posted by eisenkr at 2:45 PM on September 13, 2011

The Vuong test sounds like almost exactly what you're looking for... it's related to AIC but actually tests for significant differences between models. However, it assumes iid (and not serially correlated) values and I don't have enough experience with this test/data to know how big a deal that would be for you.

posted by en forme de poire at 4:01 PM on September 13, 2011

posted by en forme de poire at 4:01 PM on September 13, 2011

eisenkr: same problem. That tests if there is evidence that the coefficients are different, not the marginal predictive value. The latter depends on the coefficient and the distribution of the predictors.

Wow, the econometrics / finance people really did think about this. The modern citation trail for Vuong for predictions is very active, including these highly cited papers.

posted by a robot made out of meat at 5:13 PM on September 13, 2011

Wow, the econometrics / finance people really did think about this. The modern citation trail for Vuong for predictions is very active, including these highly cited papers.

posted by a robot made out of meat at 5:13 PM on September 13, 2011

Based on some further searching I think I have narrowed it down to two basic tests:

The Vuong test, which will tell you how much better one model is than another in terms of a p-value.

The Steiger Test, which does a z-transformation of the correlations, so that you can do a direct z-test comparison of them.

I need to look into some of the more modern implementations cited by a robot made out of meat, to check if the basic tests require any assumptions I can't make.

Thanks all!

posted by vegetableagony at 7:06 AM on September 14, 2011

The Vuong test, which will tell you how much better one model is than another in terms of a p-value.

The Steiger Test, which does a z-transformation of the correlations, so that you can do a direct z-test comparison of them.

I need to look into some of the more modern implementations cited by a robot made out of meat, to check if the basic tests require any assumptions I can't make.

Thanks all!

posted by vegetableagony at 7:06 AM on September 14, 2011

I would expect IID to be a big problem.

posted by ROU_Xenophobe at 8:34 AM on September 14, 2011

posted by ROU_Xenophobe at 8:34 AM on September 14, 2011

This thread is closed to new comments.

posted by ROU_Xenophobe at 11:03 AM on September 13, 2011