Any statisticians, psychometricians, econometricians, etc. in the house?
May 30, 2009 1:39 PM   Subscribe

Any statisticians, psychometricians, econometricians, etc. in the house? I've got a question about survival models.....

I'm running some survival models in STATA and have specified a model which includes an interaction term which is the product of two continuous variables. Within the OLS family of models, I know that the standard way to interpret this type of interaction effect is like so. However, I'm pretty new to the world of survival (hazard, event history, etc.) models and am not sure if the procedure outlined in the linked page is still appropriate.

If so, awesome. If not, if somebody can point me in the right direction, I would appreciate it. Something in the back of my head vaguely thinks that I maybe could accomplish this using some permutation of marginal effects, but again I'm at a loss as to the nuts and bolts.

I can meet with my friendly neighborhood stats guru on Monday, but I'm not sure I can wait that long to interpret these results (yes, I do realize that admitting this brands me an uber-nerd).
posted by jtfowl0 to Science & Nature (7 answers total) 3 users marked this as a favorite
 
I'm not sure about the answer to your question, but you didn't mention stata help which is, in most cases, a great help for issues like this. That's where I would look first.
posted by B-squared at 3:59 PM on May 30, 2009


I don't do much with EHA... I'm not sure I've ever used it in anger, actually.

But anyway, since you're trying to figure things out for yourself before your meeting on Monday, I would just turn one of the variables you're interacting into either a dummy or a set of dummies for ranges and re-run the model. Dummy/continuous interactions are a whole lot easier to interpret.

If you were trying to get this published, I'd actually suggest something similar -- display and describe a model with a dummy/continuous interaction and footnote that you get the same results with a continuous/continuous interaction model available on request.
posted by ROU_Xenophobe at 4:35 PM on May 30, 2009


Presumably if you're doing survival analysis you are using Cox proportional hazards model (as opposed to the linear regression with a continuous outcome demonstrated in your link). In such a model the hazard function can be written as follows:

H(t) = H0(t) * e^(b1*X1) * e^(b2*X2) * ... * e^(bn*Xn)

where H(t) is the hazard function, H0(t) is a baseline hazard, {b1,b2,...,bn} are the regression coefficients, and {e^b1, e^b2,...,e^bn} are the relative hazards or hazard ratios. Hazard ratios can be interpreted as follows for continuous variables without interaction terms: for every unit increase in the predictor variable Xn, the hazard rate changes by a factor of e^bn.

Simplified survival models might use a Weibull or other distribution for H0(t) but modern computers make such simplifying assumptions about the shape of the baseline hazard largely unnecessary.

On to interaction terms:

A simple model with two variables {X1, X2} and a first-order interaction could be written:

H(t) = H0(t) * e^(b1*X1) * e^(b2*X2) * e^(b3*X1*X2)

Now what is the interpretation of the coefficients...

The first thing to note is that if the interaction term is insignificant it usually gets dropped at this point and the model is rerun without it to simplify interpretation. If the interaction term appears important in the model it can't be ignored but the interpretation becomes somewhat complex. In such a scenario, for every unit increase in X1, the hazard rate increases by (e^b1) * e^(b3*X2) and for every unit increase in X2 the hazard rate increases by (e^b2) * e^(b3*X1).

One way to help in interpreting the results in such a scenario is to consider recentering the two variables about their respective means (in other words let Xa = X1 - X1m and Xb = X2 - X2m and use Xa and Xb in the model). In such a scenario, one can simplify the interpretation by concluding that for a subject with average predictor X2, the relative hazard for every unit increase in X1 is e^b1, and for a subject with average predictor X1, the relative hazard for every unit increas in X2 is e^b2, but you are still obligated to note the presence of an interaction (ie for increasing values of X2, the relative hazard for every unit increase in X1 becomes more and more pronounced or the opposite depending on the sign of b3).

The long and short of it is that when you go beyond linear models with continuous outcomes, models with continuous variables can be quite annoying to interpret, particularly with interaction terms.

One other note: I would strongly suggest that you test the linearity assumption before you blindly throw continuous variables into a model whether its simple linear regression or a Cox model. It often doesn't hold in survival analysis, and you might find that you would be better served by categorizing the continuous variable (which also helps with interpretation of interaction terms).
posted by drpynchon at 5:03 PM on May 30, 2009 [1 favorite]


Apologies for any confusion with my choice of variable nomenclature or whatever you want to call it. Hope that was helpful.
posted by drpynchon at 5:07 PM on May 30, 2009


Response by poster: All supplied answers were helpful--I'll just code some dummies for now. Basically, I'm just trying to figure out if the directional effect of the interaction term supports or opposes my research hypotheses. The dummies will certainly convey that, albeit perhaps a bit more crudely than the continuous variables would permit.

ROU, I do hope to publish this sometime in the not-too-distant future, so your comments about simplifying the interpretation are spot on. drpynchon, I'm actually estimating using a Weibull distribution currently--I know that the Cox makes no assumptions about the underlying probability distributions, but the fact that it assumes that there is no unmeasured heterogeneity makes me a tad uncomfortable given the nature of the data I'm using. I've run the model as a Weibull, Gompertz, Cox, etc. and the results I'm getting have been largely robust to these alternative specifications, which makes me worry a little less about this choice.
posted by jtfowl0 at 7:10 PM on May 30, 2009


Well if you like the Weibull it's easy enough to see how closely your Kaplan-Meier curve matches the cumulative Weibull distribution and go with it if it looks like a reasonable approximation I suppose.

I'm not sure how the Weibull does anything to address unmeasured heterogeneity better than the Cox. I always thought that with the former you are simply imposing more parametric restriction on what is otherwise the same type of model by constraining the baseline hazard function to the Weibull. In either model the potential exists for residual confounding, but I always assumed that Cox was more robust than Weibull unless one has a priori reason to believe the Weibull is "really" how the study population behaves. Admittedly, this issue may be beyond the limits of what I know about these models so take what I say on that with a grain of salt.
posted by drpynchon at 7:25 PM on May 30, 2009


Remember that the Cox regression assumes that the hazard rates change proportionally to changes in the covariates; it would be wise to test that assumption using part of your data.

The interaction term you have here is essentially saying that you have an explanatory variable whose square is linearly related to the hazard ratio.
posted by rasputin98 at 2:19 PM on June 29, 2009


« Older Help me diagnose unusual wear on my Yokohama tires...   |   Gimme Friction Newer »
This thread is closed to new comments.