Find the right statistical test
December 16, 2005 7:33 AM Subscribe
Statistics problem: Suitable statistical test for data containing rankings of preference for men and women for 4 alternatives. (more inside) Help me extract some meaning from my data, please.
I've made a survey and have trouble extracting significance from my data. Men and women have been asked to rank 4 alternatives according to preferences (1st, 2nd, 3rd, and 4th). Now, what statistical test should I use to show what, if any, differences there are? There are 40 in one group and 86 in the others, so my sample isn’t tiny. I would really appreciate any help on this one; I’d like to get home for Christmas and my project need to be in before January.
I've made a survey and have trouble extracting significance from my data. Men and women have been asked to rank 4 alternatives according to preferences (1st, 2nd, 3rd, and 4th). Now, what statistical test should I use to show what, if any, differences there are? There are 40 in one group and 86 in the others, so my sample isn’t tiny. I would really appreciate any help on this one; I’d like to get home for Christmas and my project need to be in before January.
Friedman is pretty close, but from what I can tell from one of my old stats books, it compares ranks within blocks, not ranks between blocks ("do men prefer one alternative over the others", not "do men prefer a different alternative than women"). You need to be looking for nonparametric rank tests with multiple comparisons.
posted by mbd1mbd1 at 8:53 AM on December 16, 2005
posted by mbd1mbd1 at 8:53 AM on December 16, 2005
Sounds like you want the Mann-Whitney U test.
from the SPSS manual:
M-W (alias MANN-WHITNEY) tests whether two independent samples defined by a grouping variable are from the same population. The test statistic uses the rank of each case to test whether the groups are drawn from the same population.
posted by jasper411 at 8:59 AM on December 16, 2005
from the SPSS manual:
M-W (alias MANN-WHITNEY) tests whether two independent samples defined by a grouping variable are from the same population. The test statistic uses the rank of each case to test whether the groups are drawn from the same population.
posted by jasper411 at 8:59 AM on December 16, 2005
My first reaction would be ordered logit or ordered probit. Especially if you want to put these results into a multivariate context or control for something else.
posted by ROU_Xenophobe at 9:12 AM on December 16, 2005
posted by ROU_Xenophobe at 9:12 AM on December 16, 2005
Response by poster: thanks guys, that was quick, I'll look into your suggestions right away
posted by JeNeSaisQuoi at 9:24 AM on December 16, 2005
posted by JeNeSaisQuoi at 9:24 AM on December 16, 2005
Mann-whitney loses a lot of power; I don't think you'll want to do this if you only have 40 samples in one group. I'll look in my stat textbook tonight to see if I can find a better answer.
What you want to do is compare the distribution of male answers to the distribution of female answers, though, and you have no expectation that those distributions will be Normal distributions; so a non-parametric rank test with multiple comparisons is correct.
(If you could somehow have contrived to have an equal number of men and women, and randomly pair them, you could have maximized your power in analysis. You're going to lose a lot of power because you have 86 in one group and only 40 in another. This is one good reason to design the study *after* you've decided how you're going to analyze the data. )
posted by ikkyu2 at 12:03 PM on December 16, 2005
What you want to do is compare the distribution of male answers to the distribution of female answers, though, and you have no expectation that those distributions will be Normal distributions; so a non-parametric rank test with multiple comparisons is correct.
(If you could somehow have contrived to have an equal number of men and women, and randomly pair them, you could have maximized your power in analysis. You're going to lose a lot of power because you have 86 in one group and only 40 in another. This is one good reason to design the study *after* you've decided how you're going to analyze the data. )
posted by ikkyu2 at 12:03 PM on December 16, 2005
Best answer: Well. The best way I can see to do this is very ugly, and I don't think you're going to have power to detect anything but a gargantuan difference.
Basically you're going to have 32 population frequencies, and you're going to put them in a giant 2x16 table.
Let's call your 4 alternatives A, B, C, and D.
First, let's look at the men. You're going to need to add up 4 numbers with regard to alternative A:
MA1 = number of men who ranked alternative A #1
MA2 = number of men who ranked alternative A #2
MA3 = number of men who ranked alternative A #3
MA4 = number of men who ranked alternative A #4
Then you're going to generate similar frequencies for men's ranking of alternatives B, C, and D. Now you have 16 numbers, that sum to the total number of men in your sample. Order them along the top row of a 2x16 table.
Now do the same for women, and order those 16 frequencies along the bottom row of a 2x16 table.
You now have 2 options. One is to make 16 comparisons - to compare MA1 to WA1, MB3 to WB3, and so on - and divide your resultant p statistic by 16 (Bonferroni's correction). This is ruinous to power and even if you get a statistically significant result it'll be hard to interpret its meaning.
The other way is to perform a chi-squared test for trend, with one degree of freedom, on your 2x16 table. This is going to test the following hypothesis:
"The observed difference, between the way men ranked alternatives ABCD and the way women ranked ABCD, is due to chance alone."
If your test statistic, Chi(trend) with 1 df, results in a p value of less than 0.05, you can reject the null hypothesis above and say that gender has some influence on ranking.
However - for this chi-square test to be valid, the content of every cell has to be >= 5. Since there are 32 cells and only 126 people in your study, this is not possible. You are not going to be able to make a meaningful conclusion from this data.
Can you tell us more about alternatives A,B,C, and D? If they're non-ordered, you're sunk, but if they can be ordered (ranked) in a mathematically meaningful way, you might be able to salvage this - with the above-mentioned Mann-Whitney test, no less.
(The above poster was confused about rank; the ranks that you had your subjects indicate are not the rankings that the Mann-Whitney test needs. The Mann-Whitney test needs there to be a mathematically meaningful way to rank alternatives A, B, C, and D. For example, alternative A is take NO aspirin; alternative B is take ONE aspirin; alternative C is take TWO aspirin; alternative D is take THREE aspirin. That kind of data yields to the Mann-Whitney analysis.)
Next time, design your test with the end analysis in mind - you would need at least 500 people in each group to have a decent chance of detecting a meaningful difference between them with this way of collecting data.
posted by ikkyu2 at 4:40 PM on December 16, 2005
Basically you're going to have 32 population frequencies, and you're going to put them in a giant 2x16 table.
Let's call your 4 alternatives A, B, C, and D.
First, let's look at the men. You're going to need to add up 4 numbers with regard to alternative A:
MA1 = number of men who ranked alternative A #1
MA2 = number of men who ranked alternative A #2
MA3 = number of men who ranked alternative A #3
MA4 = number of men who ranked alternative A #4
Then you're going to generate similar frequencies for men's ranking of alternatives B, C, and D. Now you have 16 numbers, that sum to the total number of men in your sample. Order them along the top row of a 2x16 table.
Now do the same for women, and order those 16 frequencies along the bottom row of a 2x16 table.
You now have 2 options. One is to make 16 comparisons - to compare MA1 to WA1, MB3 to WB3, and so on - and divide your resultant p statistic by 16 (Bonferroni's correction). This is ruinous to power and even if you get a statistically significant result it'll be hard to interpret its meaning.
The other way is to perform a chi-squared test for trend, with one degree of freedom, on your 2x16 table. This is going to test the following hypothesis:
"The observed difference, between the way men ranked alternatives ABCD and the way women ranked ABCD, is due to chance alone."
If your test statistic, Chi(trend) with 1 df, results in a p value of less than 0.05, you can reject the null hypothesis above and say that gender has some influence on ranking.
However - for this chi-square test to be valid, the content of every cell has to be >= 5. Since there are 32 cells and only 126 people in your study, this is not possible. You are not going to be able to make a meaningful conclusion from this data.
Can you tell us more about alternatives A,B,C, and D? If they're non-ordered, you're sunk, but if they can be ordered (ranked) in a mathematically meaningful way, you might be able to salvage this - with the above-mentioned Mann-Whitney test, no less.
(The above poster was confused about rank; the ranks that you had your subjects indicate are not the rankings that the Mann-Whitney test needs. The Mann-Whitney test needs there to be a mathematically meaningful way to rank alternatives A, B, C, and D. For example, alternative A is take NO aspirin; alternative B is take ONE aspirin; alternative C is take TWO aspirin; alternative D is take THREE aspirin. That kind of data yields to the Mann-Whitney analysis.)
Next time, design your test with the end analysis in mind - you would need at least 500 people in each group to have a decent chance of detecting a meaningful difference between them with this way of collecting data.
posted by ikkyu2 at 4:40 PM on December 16, 2005
F***, ignore the above; ordered logit is the right way to do this. At least I learned something.
I'm still skeptical about your power, but this will maximize it. You'll also get some data about the degree of difference with regard to each alternative.
posted by ikkyu2 at 5:08 PM on December 16, 2005
I'm still skeptical about your power, but this will maximize it. You'll also get some data about the degree of difference with regard to each alternative.
posted by ikkyu2 at 5:08 PM on December 16, 2005
Response by poster: The alternatives are non-orded I'm afraid. Thx a lot though, ikkyu2. You really went the extra mile in helping me out. Couple of questions: You said:
“Then you're going to generate similar frequencies for men's ranking of alternatives B, C, and D. Now you have 16 numbers, that sum to the total number of men in your sample.”
It sums to 4 times the number of men since they all got to choose firsts, second, third and fourth alternative, no? Am I missing something? I wish I could show the table in some way but intuitively I thought it would be possible to show a difference somehow. E.g. the fourth alt. was the second most popular choice among women and the last among men, 10% (out of 40) versus 28% (out of 86). Isn’t that enough to show something?
posted by JeNeSaisQuoi at 5:36 PM on December 16, 2005
“Then you're going to generate similar frequencies for men's ranking of alternatives B, C, and D. Now you have 16 numbers, that sum to the total number of men in your sample.”
It sums to 4 times the number of men since they all got to choose firsts, second, third and fourth alternative, no? Am I missing something? I wish I could show the table in some way but intuitively I thought it would be possible to show a difference somehow. E.g. the fourth alt. was the second most popular choice among women and the last among men, 10% (out of 40) versus 28% (out of 86). Isn’t that enough to show something?
posted by JeNeSaisQuoi at 5:36 PM on December 16, 2005
It sums to 4 times the number of men since they all got to choose firsts, second, third and fourth alternative, no?
Yes, certainly.
I wish I could show the table in some way but intuitively I thought it would be possible to show a difference somehow. E.g. the fourth alt. was the second most popular choice among women and the last among men, 10% (out of 40) versus 28% (out of 86). Isn’t that enough to show something?
Well, it depends on how you look at it. You not only have to show that it was a meaningful result supporting your mechanistic hypothesis with a difference between men and women; you also have to show that if you repeated your study 100 times, you'd expect to find a difference of similar magnitude in that direction fewer than 5 times (one-tailed).
In other words, you have to reject the null hypothesis (which is: the observed effect is due to random chance alone). That is to say, not only does the difference have to support your theory, but it has to be statistically significant.
Since you have 4 alternatives and 4 ranks = 16 cells, that really opens up the possibilities for finding an association between two cells that *looks* significant but in fact could be expected to occur frequently due to chance alone.
This is a decent page about ordered logit, but if you've never used SAS before, you're going to need someone to help you.
posted by ikkyu2 at 5:56 PM on December 16, 2005
Yes, certainly.
I wish I could show the table in some way but intuitively I thought it would be possible to show a difference somehow. E.g. the fourth alt. was the second most popular choice among women and the last among men, 10% (out of 40) versus 28% (out of 86). Isn’t that enough to show something?
Well, it depends on how you look at it. You not only have to show that it was a meaningful result supporting your mechanistic hypothesis with a difference between men and women; you also have to show that if you repeated your study 100 times, you'd expect to find a difference of similar magnitude in that direction fewer than 5 times (one-tailed).
In other words, you have to reject the null hypothesis (which is: the observed effect is due to random chance alone). That is to say, not only does the difference have to support your theory, but it has to be statistically significant.
Since you have 4 alternatives and 4 ranks = 16 cells, that really opens up the possibilities for finding an association between two cells that *looks* significant but in fact could be expected to occur frequently due to chance alone.
This is a decent page about ordered logit, but if you've never used SAS before, you're going to need someone to help you.
posted by ikkyu2 at 5:56 PM on December 16, 2005
You can still do this with ordered logit, sort of. You can use ordered logit (or ordered probit) to see what influences the ranking of CHOICE1. Then you can run a separate analysis for rankings of CHOICE2, etc.
Or, you could run a multinomial logit on which alternative a respondent ranks highest. MNL is canned in most statistical packages and particularly easy to run in Stata. But, MNL can be a bear to interpret, since your coefficients are all about probabilities relative to the excluded choice. This means that you can have a positive coefficient on men for CHOICE 1, but that men are actually less likely to rank CHOICE 1 highest than women are. It's not actually difficult to interpret, though, just a bit fussy and time-consuming.
posted by ROU_Xenophobe at 5:58 PM on December 16, 2005
Or, you could run a multinomial logit on which alternative a respondent ranks highest. MNL is canned in most statistical packages and particularly easy to run in Stata. But, MNL can be a bear to interpret, since your coefficients are all about probabilities relative to the excluded choice. This means that you can have a positive coefficient on men for CHOICE 1, but that men are actually less likely to rank CHOICE 1 highest than women are. It's not actually difficult to interpret, though, just a bit fussy and time-consuming.
posted by ROU_Xenophobe at 5:58 PM on December 16, 2005
Given a choice, I'd use stata over sas, but use what you're used to. If you haven't used either, you'll likely find stata less infuriating to get to know. Or cut to the chase and start using R! It's free!
If anything, I actually find MNL easier to interpret and less restrictive than ordered logit, since the analogies to normal logit/probit are better.
E.g. the fourth alt. was the second most popular choice among women and the last among men,
Ordered logit deals with just that. The estimation command in stata would look something like
ologit choice4 male [other variables if you got 'em]
And then you'd get output like in ikkyu2's example. Firm interpretation of all the stuff is harder, but as a rough guide a positive coefficient on male means that men rank CHOICE4 higher.
posted by ROU_Xenophobe at 6:07 PM on December 16, 2005
If anything, I actually find MNL easier to interpret and less restrictive than ordered logit, since the analogies to normal logit/probit are better.
E.g. the fourth alt. was the second most popular choice among women and the last among men,
Ordered logit deals with just that. The estimation command in stata would look something like
ologit choice4 male [other variables if you got 'em]
And then you'd get output like in ikkyu2's example. Firm interpretation of all the stuff is harder, but as a rough guide a positive coefficient on male means that men rank CHOICE4 higher.
posted by ROU_Xenophobe at 6:07 PM on December 16, 2005
This thread is closed to new comments.
posted by milkrate at 8:13 AM on December 16, 2005