Non-parametric models for a non-normal outcome with multiple explanatory variables?
August 8, 2012 3:25 PM   Subscribe

StatisticsFilter, non-parametric edition: I'd like to test if my non-normally distributed outcome is significantly different between two groups adjusting for a third variable.

My dataset contains three variables on approximately 750 patients with multiple observations per patient (approximately 2500 observations in total.) My outcome is non-normally distributed and, for clinical and scientific reasons, I cannot transform it from its current distribution. The two explanatory variables are a time-updated factor distributed binomially (i.e., yes/no) and a time-updated normally-distributed integer.

At baseline, a Mann-Whitney-U test indicates the outcome is significantly different between the yes/no strata (p = 0.01.)

However, for this finding to have any clinical relevance, I need to adjust for the third variable. To do this I presume I need to build a non-parametric model including both explanatory variables. Also, it would be even better if I could model the entire dataset, i.e., use all observations and account for the correlated observations within patients. (In past analyses with a binomial outcome I've used generalized linear mixed-effects models.)

Any help? I'm using R, if that's of any help...
posted by docgonzo to Science & Nature (5 answers total) 4 users marked this as a favorite
 
Response by poster: Thanks for any help!
posted by docgonzo at 3:25 PM on August 8, 2012


Is your outcome continuous? Categorical? Dichotomous? If continous, what's the range? It's not normally distributed, but does it look like any other distributions (binomial, Poisson, etc.)? Give us more information on your outcome and whether you are assessing your outcome at multiple time points (i.e. outcome was x at time 1, y at time 2) or whether it's more like a survival analysis (x months until outcome for patient 1, y months until outcome for patient 2, etc.)
posted by k96sc01 at 4:00 PM on August 8, 2012


Response by poster: Outcome is positive, continuous, bimodal, with one peak at 1.5 and another at 5. The outcome is assessed for every patient at the first interview and every six month interview thereafter. Won't work as a survival model, unfortunately.
posted by docgonzo at 6:26 PM on August 8, 2012


If you have a firm grasp on the data-generating process absent the binomial variable of interest, you could just simulate the null using the distribution of the integer variable that you actually have.
posted by ROU_Xenophobe at 6:29 PM on August 8, 2012


In terms of clinical relevance, is 1.5 meaningfully different from 5? Would dichotomizing and doing logistic regression be a possibility?

The analytic method you choose will have a big influence on your finished product and you have a number of interesting wrinkles in your dataset. Your questions would probably be better addressed by collaborating with a biostatistician or epidemiologist experienced in time-series analyses. Is that a possibility for you?
posted by k96sc01 at 6:47 AM on August 9, 2012


« Older Is it possible to get through a hard time in a...   |   Can we get our baby to sleep alone without a few... Newer »
This thread is closed to new comments.