# What is the non-parametric version of two-way ANOVA?

October 3, 2007 5:36 PM Subscribe

How do you examine the interaction of two factors on a non-parametric dependent variable (e.g. nonparametric version of the 2-way ANOVA)?

Asking for a friend who needs some stats help-

He conducted a study in which two groups of individuals (one visually impaired group and one control group) were administered a visual task. Each group has an equal sample size and an equal number of males and females.

He initially conducted a 2x2 ANOVA in order to examine the interaction between group condition and sex on visual task performance. He has been asked to re-analyze the data given that the dependent variable (performance on visual task; #of errors) is not normally distributed. How can he examine the

And specifically, can this analysis be conducted in SPSS and, if so, how would it be done? If not in SPSS, what program?

Asking for a friend who needs some stats help-

He conducted a study in which two groups of individuals (one visually impaired group and one control group) were administered a visual task. Each group has an equal sample size and an equal number of males and females.

He initially conducted a 2x2 ANOVA in order to examine the interaction between group condition and sex on visual task performance. He has been asked to re-analyze the data given that the dependent variable (performance on visual task; #of errors) is not normally distributed. How can he examine the

**interaction of sex and group condition on visual task performance using a nonparametric statistical method**?

And specifically, can this analysis be conducted in SPSS and, if so, how would it be done? If not in SPSS, what program?

Perhaps a Friedman two-way analysis of variance? UCLA has a nice table for choosing an appropriate statistical analysis with links showing how to do each test using SAS, Stata and SPSS

posted by RichardP at 6:40 PM on October 3, 2007

posted by RichardP at 6:40 PM on October 3, 2007

I'm not an expert, but I would think about doing a series pairwise comparisons with the Wilcoxon-Mann-Whitney test.

posted by thrako at 6:41 PM on October 3, 2007

posted by thrako at 6:41 PM on October 3, 2007

Two standard ways to do this would be

1) transform the data to normality. Try a box-cox transform.

2) transform the data to (global) ranks and do a standard anova. It's very robust.

It's hard to say exactly what the best thing to do is without knowing how the outcomes are distributed. But really, transforming it to approximate normality (even if it's discrete) is a good place to start.

for review:

The Rank Transform Method in Some Two-Factor Designs

Michael G. Akritas

Journal of the American Statistical Association, Vol. 85, No. 409 (Mar., 1990), pp. 73-78

posted by a robot made out of meat at 6:43 PM on October 3, 2007

1) transform the data to normality. Try a box-cox transform.

2) transform the data to (global) ranks and do a standard anova. It's very robust.

It's hard to say exactly what the best thing to do is without knowing how the outcomes are distributed. But really, transforming it to approximate normality (even if it's discrete) is a good place to start.

for review:

The Rank Transform Method in Some Two-Factor Designs

Michael G. Akritas

Journal of the American Statistical Association, Vol. 85, No. 409 (Mar., 1990), pp. 73-78

posted by a robot made out of meat at 6:43 PM on October 3, 2007

Response by poster: Thank you so much! He's reviewing and trying to figure out if these are applicable. Definitely appreciated and I'll mark a best as soon as I get the word.

Any replies still welcome.

posted by cashman at 7:31 PM on October 3, 2007

Any replies still welcome.

posted by cashman at 7:31 PM on October 3, 2007

cash: a quick word about what the data are like would help. For example, if the outcomes are ordinal (like 1,2,3,4,5) then the many non-parametric rank tests aren't going to work.

posted by a robot made out of meat at 7:38 PM on October 3, 2007

posted by a robot made out of meat at 7:38 PM on October 3, 2007

If you have access to PRIMER it is set up to do this kind of thing.

posted by fshgrl at 7:47 PM on October 3, 2007

posted by fshgrl at 7:47 PM on October 3, 2007

Response by poster: Not sure if I'm answering your question correctly, but I think the data might qualify as ordinal since the values range from 1 to 5 errors. Its skewed towards most people getting most of the items correct.

posted by cashman at 7:48 PM on October 3, 2007

posted by cashman at 7:48 PM on October 3, 2007

Best answer: I am a little rusty on this, but I seem to recall that for these kinds of 2x2 designs with non-parametric data the Mantel-Haenszel chi-square test is the way to go. If you're cramming this into a 2x2 table then you must be stratifying the dependent results (# of errors on visual testing) into PassVisualTest and FailVisualTest categories. This may not be the best way to go about this.

The (Wilcoxon)-Mann-Whitney U test is a special or degenerate case of this test so it may also be appropriate.

It's worth knowing that you have to have a pretty large sample size for either of these tests to have a decent chance of detecting a significant result. If any of the 2x2 cells contain less than 5 subjects, the chi-square test is likely to produce a type II error (failure to detect a true difference between the groups); the Mann-Whitney test likewise requires a fairly large sample, 50-100 or so in each wing to detect reasonable differences.

I would strongly caution you about transforming data to fit a normal distribution, without the help of a trained statistician and a decent theoretical explanation of why your transform is valid. Squinting at the data and saying "let's log-transform this so it looks more bell-shaped" in order to apply a parametric analysis is one of the easiest ways to destroy validity that I know of.

I don't know why I bother with these questions, actually; I suggest just waiting until ROU_Xenophobe weighs in with his answer.

posted by ikkyu2 at 7:48 PM on October 3, 2007

The (Wilcoxon)-Mann-Whitney U test is a special or degenerate case of this test so it may also be appropriate.

It's worth knowing that you have to have a pretty large sample size for either of these tests to have a decent chance of detecting a significant result. If any of the 2x2 cells contain less than 5 subjects, the chi-square test is likely to produce a type II error (failure to detect a true difference between the groups); the Mann-Whitney test likewise requires a fairly large sample, 50-100 or so in each wing to detect reasonable differences.

I would strongly caution you about transforming data to fit a normal distribution, without the help of a trained statistician and a decent theoretical explanation of why your transform is valid. Squinting at the data and saying "let's log-transform this so it looks more bell-shaped" in order to apply a parametric analysis is one of the easiest ways to destroy validity that I know of.

I don't know why I bother with these questions, actually; I suggest just waiting until ROU_Xenophobe weighs in with his answer.

posted by ikkyu2 at 7:48 PM on October 3, 2007

OK, so I actually cracked a book here. The case and control groups here have equal

I think I'd want to eyeball the data before I suggested anything further.

posted by ikkyu2 at 8:10 PM on October 3, 2007

*n*and equal number of men and women. Were the cases and controls actually intentionally matched? If they were, you can gain a great deal of power by using a paired analysis, which in this case would be the Wilcoxon matched pairs signed rank sum test. The data doesn't have to be parametric but it does have to be roughly symmetric within groups around the within-group mean.I think I'd want to eyeball the data before I suggested anything further.

posted by ikkyu2 at 8:10 PM on October 3, 2007

Response by poster: Wow, thanks a lot ikkyu2! The data was not intentionally matched [via my friend], though retrospectively they are well matched on age, race and sex.

I really appreciate all the help, especially since most of this is going over my head [and into his]. Hopefully by tomorrow this will all be figured out.

posted by cashman at 8:25 PM on October 3, 2007

I really appreciate all the help, especially since most of this is going over my head [and into his]. Hopefully by tomorrow this will all be figured out.

posted by cashman at 8:25 PM on October 3, 2007

*I don't know why I bother with these questions, actually; I suggest just waiting until ROU_Xenophobe weighs in with his answer.*

Heh. Thanks.

But this sounds like actual science, with experiments and everything, and I live in a different world from that; I almost never have to deal with anova except for the F or Chi2 statistics embedded in regression output.

My first reaction: Is there a strong reason why you can't just run an event count model -- a poisson regression or negative binomial regression? But I come from a world where the first reaction is to find a regression-like tool. The only times I've run event count models were in classes, though, so I can't offer much help on them. Both make assumptions about the dependent variable, but I forget which.

That said, the tactically right answer will come from your friend's discipline. Not much sense in fighting a battle for your results

*and*an entirely separate battle to get your methods accepted, at least not unless you're already securely tenured or otherwise safe. Better, in the immediate sense, to use the methods that your discipline accepts.

posted by ROU_Xenophobe at 10:22 PM on October 3, 2007

First eyeball a histogram of the test scores of the cases and a similar histogram of the test scores of the controls. Ideally each histogram would be pretty symmetric (not skewed to the left or right) and the two histograms would be somewhat similar in overall shape, with the blind folks' translated over towards more errors. If they are skewed a transform can be applied to help correct the skew and make them more symmetric about their means. The same transform has to be applied to each group.

Then, I would have someone who's blind (in the statistical sense, not the visual sense) to the outcome data match the cases and controls retrospectively. If you're matching on just age, gender and race, which I agree is reasonable, you might even be able to get a computer program to figure out the optimal match, or just do it by hand.

Next, apply the Wilcoxon signed rank sum test. This test will tell you if there's a statistically significant difference in test scores between cases and controls, and it can generate a confidence interval relating to the magnitude of that difference.

May I put in a plug for Douglas G. Altman's

posted by ikkyu2 at 11:51 PM on October 3, 2007

Then, I would have someone who's blind (in the statistical sense, not the visual sense) to the outcome data match the cases and controls retrospectively. If you're matching on just age, gender and race, which I agree is reasonable, you might even be able to get a computer program to figure out the optimal match, or just do it by hand.

Next, apply the Wilcoxon signed rank sum test. This test will tell you if there's a statistically significant difference in test scores between cases and controls, and it can generate a confidence interval relating to the magnitude of that difference.

May I put in a plug for Douglas G. Altman's

*Practical Statistics for Medical Research*. Chapter 9, sections 4 through 8 deal with these tests.posted by ikkyu2 at 11:51 PM on October 3, 2007

Which event would you count, ROU_Xenophobe? Missing a test question, or passing a fail threshold?

posted by ikkyu2 at 11:52 PM on October 3, 2007

posted by ikkyu2 at 11:52 PM on October 3, 2007

Ok, since you say pseudo-ordinal, I give you http://www.whitemoldresearch.com/files/022706-01.pdf

as a reference, skip to the discussion for summary.. Apparently the rank transform is something that people used to do in this case, but don't recommend anymore (in favor of distributional tests). Someone said Friedman test up above. There might be a way to get that test to do what you want, but it's normally thought of as a complete block design / repeated measures test.

posted by a robot made out of meat at 4:46 AM on October 4, 2007

as a reference, skip to the discussion for summary.. Apparently the rank transform is something that people used to do in this case, but don't recommend anymore (in favor of distributional tests). Someone said Friedman test up above. There might be a way to get that test to do what you want, but it's normally thought of as a complete block design / repeated measures test.

posted by a robot made out of meat at 4:46 AM on October 4, 2007

If ikkyu2 is right and your participants are either passing or failing the test, I'd do a logistic regression with a sex*condition interaction term. But I agree with ROU_Xenophobe that you should do whatever is standard practice in your field.

posted by nixxon at 6:01 AM on October 4, 2007

posted by nixxon at 6:01 AM on October 4, 2007

*Which event would you count, ROU_Xenophobe? Missing a test question, or passing a fail threshold?*

Number of questions missed.

If it's pass/fail, then you have a binary DV and can do logit or probit (or gompit or scobit or...)

*I'd do a logistic regression with a sex*condition interaction term.*

It might be easier to just use dummies for impaired men, impaired women, and unimpaired men (OR otherwise any 3 of the 4 categories in your 2x2), and then interpret the differences in the coefficients on each dummy. This is easy enough by hand if you output the variance-covariance matrix, or some software (stata) can do this for you. And it won't use any more DF than a dummy for sex, a dummy for condition, and their interaction.

But again, you want to reflect practice in your field, not mine. I know nothing about the different kinds of sophistication that get used to assess experimental outcomes, because we can hardly ever run experiments (and I never do).

posted by ROU_Xenophobe at 7:13 AM on October 4, 2007

This thread is closed to new comments.

Utest looks relevant.posted by parudox at 6:40 PM on October 3, 2007