# What is normal?

May 4, 2010 11:01 AM Subscribe

Statistics-filter: Two relatively (I think) simple statistics questions.

It's been a while since my last statistics class, and I have a couple of questions I need to get my head around. Dealing here with a small sample of participants and data collected through a survey with likert scale items.

For a small (n<30) sample like this, what would be a good test for normality to illustrate that the data are appropriately distributed for a t test? Since the sample is so small, it seems that just laying out scores and comparing them to a normal histogram is not the ideal method. What would you recommend? I'm sort of losing myself in reading about the more formal tests right now, and need a quick cue to which ones may or may not be appropriate for me. I don't have my data yet, just planning ahead.

Second question, for the same small n study: Is there a particular test I should be aware of for comparing changes in a single participant's scores across sets of identical surveys administered at different times? One set is would be a pre and post test, another would be four interstitial surveys with different items than the pre-post.

Thanks in advance.

It's been a while since my last statistics class, and I have a couple of questions I need to get my head around. Dealing here with a small sample of participants and data collected through a survey with likert scale items.

For a small (n<30) sample like this, what would be a good test for normality to illustrate that the data are appropriately distributed for a t test? Since the sample is so small, it seems that just laying out scores and comparing them to a normal histogram is not the ideal method. What would you recommend? I'm sort of losing myself in reading about the more formal tests right now, and need a quick cue to which ones may or may not be appropriate for me. I don't have my data yet, just planning ahead.

Second question, for the same small n study: Is there a particular test I should be aware of for comparing changes in a single participant's scores across sets of identical surveys administered at different times? One set is would be a pre and post test, another would be four interstitial surveys with different items than the pre-post.

Thanks in advance.

After checkin' out some descriptives and plots, I'd do me a nested RM-ANOVA if it didn't look so screwy that I couldn't.

posted by solipsophistocracy at 12:06 PM on May 4, 2010

posted by solipsophistocracy at 12:06 PM on May 4, 2010

For question 1, I think what you want is a Normal Quantile Plot. It is tedious to do by hand but quite easy with software. This site gives you a step-by-step method for using Excel to calculate this plot. You can even do a linear regression to get an r value that quantifies how close to normal your sample is.

I don't have a good answer for question 2.

posted by El_Marto at 1:41 PM on May 4, 2010

I don't have a good answer for question 2.

posted by El_Marto at 1:41 PM on May 4, 2010

There is a little known trick to look for violations of assumptions for a t test/anova for the straightforward case anyway. It's done empirically. El_Marto's explanation is fine too. Here's the rationale:

When you do a t test or ANOVA you treat the data as if it is continuous. That is the distance between a value of 1 and 2 is the same as the distance between 2 and 3 and so on throught the scale. For a non-parametric test, no such assumption exists, so you discard the data about scale, and only keep the data about order. This results in information loss, which when the data is non-parametric is justifiable, when it isn't it isn't.

So given that, and how the mann-whitney test or Kruskal wallis test are essentially equivalent tests for ordinal data as the t test/ANOVA are for continuous data, we expect that due to the information loss the non-parametric tests have less power.

The upshot of this is that if your data is parametric, the p value for the t test will always be lower than the p value for the mann-whitney test. However if it is not, the reverse is true as you have violated the assumptions of the t test too much. This is because if the data is parametric, the resultant information loss will decrease the statistical power available to you. If it's non-parametric, then it has no effect, or the spurious use of the continuous data will decrease the statistical power.

posted by singingfish at 3:47 PM on May 4, 2010 [1 favorite]

When you do a t test or ANOVA you treat the data as if it is continuous. That is the distance between a value of 1 and 2 is the same as the distance between 2 and 3 and so on throught the scale. For a non-parametric test, no such assumption exists, so you discard the data about scale, and only keep the data about order. This results in information loss, which when the data is non-parametric is justifiable, when it isn't it isn't.

So given that, and how the mann-whitney test or Kruskal wallis test are essentially equivalent tests for ordinal data as the t test/ANOVA are for continuous data, we expect that due to the information loss the non-parametric tests have less power.

The upshot of this is that if your data is parametric, the p value for the t test will always be lower than the p value for the mann-whitney test. However if it is not, the reverse is true as you have violated the assumptions of the t test too much. This is because if the data is parametric, the resultant information loss will decrease the statistical power available to you. If it's non-parametric, then it has no effect, or the spurious use of the continuous data will decrease the statistical power.

posted by singingfish at 3:47 PM on May 4, 2010 [1 favorite]

I assume that you mean some composite score instead of a single item. What you want is called a QQ plot or normal quantile plot. Googling those terms and your stat package of choice will find an implementation. You want to look at such a plot of the residuals from your analysis and not the raw data. Imagine that one group was exposed to something with a big effect but otherwise the same as the other group; the raw data would be bimodal and not normal looking at all. Formal tests of normality are available, and most of them have a geometric interpretation about the difference in the empirical and expected cumulative distribution plot.

I almost always ask people to do linear regression presentation instead of ANOVA presentation. They are almost the same model, and regression type output is way easier for most people to understand. For a repeated measures analysis, you can put a fixed effect on person and before /after or number in sequence. You can also use a random effect analysis, which is similar but places some restrictions on the "person" effects. Doing these in stata, SAS, and R isn't too hard, but I don't know about other packages.

posted by a robot made out of meat at 6:59 PM on May 4, 2010

I almost always ask people to do linear regression presentation instead of ANOVA presentation. They are almost the same model, and regression type output is way easier for most people to understand. For a repeated measures analysis, you can put a fixed effect on person and before /after or number in sequence. You can also use a random effect analysis, which is similar but places some restrictions on the "person" effects. Doing these in stata, SAS, and R isn't too hard, but I don't know about other packages.

posted by a robot made out of meat at 6:59 PM on May 4, 2010

*the p value for the t test will always be lower than the p value for the mann-whitney test.*

Thanks for that, Singingfish, I would never have considered that!

posted by Sutekh at 5:18 PM on May 7, 2010

The t-test is extremely robust to violations of the implicit assumptions (there's a nice quasi-likelihood literature which explains this), but be careful about p-value shopping because some parametric methods do give over-optimistic results when their assumptions are violated.

posted by a robot made out of meat at 11:58 AM on May 12, 2010

posted by a robot made out of meat at 11:58 AM on May 12, 2010

This thread is closed to new comments.

Testing for changes in the score can be performed with a paired t-test for two points. With scale data this too can lead to violated assumptions, but differences tend to be less skewed in general than absolute scores. For serial data, you're probably going to need a more complex approach (generalized estimating equations or mixed effects models).

posted by drpynchon at 11:18 AM on May 4, 2010