What statistical model should I choose?
May 24, 2013 7:41 AM   Subscribe

I'm about to begin a new project that looks at the outcomes of specific events, and would like to query the hivemind to see what kinds of approaches I can take to it. I'm always impressed with the wide variety of approaches to statistical problems I see on here.

We are looking at historical data of individuals over a period of two years, over which they underwent a series of intervention efforts. The events were either be successful or not (0/1). The intervention action itself is be the same for all events, but the conditions surrounding the event (e.g. time since last intervention, number of previous interventions, age, income, health factors) are different. Once an event succeeds, they drop out of the program.

We want to create a model that best predicts, given an intervention, what is the chance it will be successful? We are interested in both between and within-individual characteristics. This is a completely observational analysis.

My background is in science, I am not a statistician by training, but I'm familiar with applied statistics. So speak slowly, but not too slowly. I'm rather good at researching and kludging together models, but I feel like there should be a Real Accepted Statistical (c) answer to this question that my Google-fu isn't finding. I've done repeated measures analyses before, but not a logit-type model with events like this.

I may be making this too difficult even, maybe the best way is to do a logistic regression specifying repeated subjects, like here. I'm working primarily in SAS, so references to specific procs would be helpful, but not necessary in any way.

Thanks!
posted by Tooty McTootsalot to Science & Nature (9 answers total) 1 user marked this as a favorite
 
Have you looked at survival analysis (also sometimes referred to as event history modeling)? Applied Longitudinal Data Analysis is an excellent reference, and the UCLA stats consulting site has sample SAS code. There's also the SAS-specific Survival Analysis Using SAS: A Practical Guide.
posted by research monkey at 7:54 AM on May 24, 2013


Yes, survival analysis may be a good place to start. Since the number if previous interventions will affect things, you'll want to look at survival analysis with time-dependent covariates.
posted by matildatakesovertheworld at 8:06 AM on May 24, 2013


Response by poster: I looked into that, but wasn't sure if that was the right approach. From my (limited) reading, it looks like survival analysis is mostly concerned with predicting the time until an event occurs.

Here, the time until the event isn't as big of a concern, we just want to predict the success of an intervention. The timing of the events is completely controlled by us (think: 'researcher called on a non-random date and asked a yes/no question' rather than 'person undergoing regular treatments has a heart attack').

Does that make sense? Or am I missing something about survival analysis?
posted by Tooty McTootsalot at 9:04 AM on May 24, 2013


Best answer: Unless I'm misunderstanding something about your problem, I'd probably do this using a multilevel logistic regression. In SAS, PROC GLIMMIX should do this for you.
posted by plantbot at 9:45 AM on May 24, 2013 [2 favorites]


Best answer: I was also going to suggest multilevel modeling, where one level is the individual and another level is the repeated measurements nested within the individual . As plantbot says, use PROC GLIMMIX. Sometimes PROC NLMMIXED is used (this paper compares the two).
posted by research monkey at 10:20 AM on May 24, 2013 [1 favorite]


Does everybody end up succeeding once? Is it plausible that past interventions have lingering beneficial effects (versus past non-success merely indicates that a person is bad off)? If you just waited, would people end up succeeding on their own at some rate? Are successes judged immediately?

It's true that underneath survival models is a time to event (although you can include a cured fraction), but you can use the base rate and hazard ratios to compute failure probabilities (event in the biostat lit are usually called failures for historical reasons) in a specified time frame.

I am hesitant to recommend mixed-model procedures given the design of your data.

Given that you keep hitting the participants who don't succeed, you may be interested in the techniques behind N-of-1 trials (I am not terribly familiar with those).
posted by a robot made out of meat at 7:18 AM on May 25, 2013


Response by poster: It is plausible, and likely, that there are lingering benefits to past events. However, not everybody ends up succeeding, we don't expect them to. Yes, successes are judged immediately.

I will look up those types of models! Thanks!
posted by Tooty McTootsalot at 11:24 AM on May 27, 2013


In that case mixed / random effect models are probably a good bet.
posted by a robot made out of meat at 12:53 PM on May 27, 2013


In case it comes up later, here's what had me confused. Your setup is a little odd in that you are not estimating the effect of treatment on an outcome; outcomes only exist when people are treated. If there was a 0/1 status waiting out there which changed then survival models would make sense. Also, because people keep getting treated and stop on success, it will inevitably look like getting treatment is associated with worse disease.
posted by a robot made out of meat at 3:49 PM on May 27, 2013


« Older What to do about a muscle strain 1 wk before my...   |   Starting an iOS development company Newer »
This thread is closed to new comments.