Statistical power and the validity of experimental results
July 27, 2014 6:03 AM Subscribe
What is the relationship between statistical power and the validity of an experiment in general? If I have low power, but a very low p value, am I still OK?
I work for a dietary supplement company that also makes skin care products, and some of those products are tested clinically. Now they are talking about repeating some of the clinical tests in another region of the world in which the products will be sold because the marketing department thinks that would be good, and the question came up: How do you decide how many subjects to include?
The answer, I learn, is statistical power. You include as many subjects as are needed to give you an appropriate power, say 80%, given your chosen significance level alpha and expected effect size, for the type of statistical test you are performing. Power is 1 - beta, beta being the chance of making a type II error. So if your study is underpowered, your beta is large, and the chance of missing an effect that is actually there is high. Meanwhile, alpha is the chance of making a type I error, or seeing an effect when there is really nothing there. You compare your calculated p value to alpha, and if p is lower, your results are statistically significant and you can conclude that your observed effect is unlikely to be due to chance. But what if p is small, lower than alpha, but power is also low? Does that invalidate the results?
For example, one of the studies they want to repeat used a two-tailed paired t test to compare before and after treatment means for a measurement. Alpha was 0.05 and population size was 30. After the fact, I calculated a Cohen's d of 0.49. All this gives a power of less than 50%, which means the study was underpowered. At the same time, p was 0.000004, much lower than alpha. I can tell the people at work that, when the study is repeated, we are going to need more subjects or we risk missing the effect that we saw in the first trial, but what can we conclude about that first trial? Power was low, but p was much lower than alpha. Are the results no good? Or can we still trust that p? Or, what is also possible, am I completely confused and all this doesn't work the way I think?
Thanks for any help you can provide!