Stats: is a set of values within a normal range?
February 12, 2013 3:58 AM   Subscribe

I'm trying a analyse a set of biological data for a research project and I'm having trouble finding the appropriate statistical tests to use.

My data:

3 time points with levels of an immunoglobulin (Ig) for n patients
Day 1 n=23
Day 2 n = 23
Day 3 n = 14

I've run a D'Agostino and Pearson normality test for each time point, and some are normal and some not.

I know what the range and median levels of the Ig are for healthy patients. I want to know 2 things:

a) at each time point, if the levels I have are significantly different from health
b) if the levels of the Ig are significantly different between time points

A previous paper with similar data used a Wilcoxon Signed Rank test using the median levels for healthy patients as the "theoretical median" to answer a).

My questions:
1) Is the Wilcoxon Signed Rank test indeed the right test to answer a)?
2) Should I use the Wilcoxon even for the sets of data which are normally distributed?
3) What does it mean when the p-values for the Wilcoxon are either exact or Gaussian estimates?
4) Which test do I use to answer b)? My software suggests the Friedman test for repeated measure non-parametric, but won't run it because I'm missing some values for Day 3.

Please help! This type of stats is beyond what the other people in my lab are familiar with...
posted by snoogles to Science & Nature (3 answers total)
1) A Wilcoxon test isn't a bad choice if you're worried about normality, but can also be used on Normally-distributed data. The key thing to remember is that an ANOVA or t-test compares means, rather than medians.

2) That's fine.

3) Not sure, sorry.

4) A Durbin test might work for you, since it doesn't require complete measurements between groups. Same principal as the Friedman test.
posted by FrereKhan at 4:23 AM on February 12, 2013

3) The p-value tells you the probability of observing a value of the test statistic at least as extreme as the observed, given the null hypothesis. For small n, it is feasible to compute the exact distribution of the test statistic by simply enumerating all possible outcomes of the test under the null. As n grows larger the exact approach becomes intractable, but (OTOH) the distribution of the test statistic becomes increasingly well-approximated by a Gaussian. So for small n the distribution is computed exactly, and for large n it is approximated.

PS StatisticalHobbyHorseFilter: Testing for normality is not as easy as we would like it to be. Typically you have some a priori reason to consider your data normally distributed, and you go with that unless dissuaded by overwhelming evidence. I appreciate the elegance of non-parametric tests myself, but I do so with the knowledge that I am often throwing away power when I could be exploiting parametricity.
posted by lambdaphage at 5:17 AM on February 12, 2013 [1 favorite]

I would use a non-parametric test simply because the n sizes in each group are small.

For B, I would use the Kruskal-Wallis ANOVA as an alternative to test if all 3 time points are significantly different and then use individual Wilcoxin t-tests to further investigate the differences between pairs of time points.
posted by Young Kullervo at 6:08 AM on February 12, 2013

« Older Yahoo accounts under attack   |   What's that coming over the hill, is it a-FINALS!!... Newer »
This thread is closed to new comments.