Data Analyses That Don't Reply On P-Scores?
December 8, 2013 6:42 PM Subscribe
Long story short, I am a researcher who vehemently disagrees with the reliance on p-values to define significance.
I am looking for fields of data analysis that place less emphasis on p-values. Is there such a thing?
Model selection / multi-model inference is an alternative that produces interpretable, publishable results. There's lots of other resources I can recommend from an ecological point of view, but I assume that social scientists are using similar tools these days. Basically, you're generate multiple hypotheses, each of which is then defined with one or more statistical models. The model that best fits the data you have available is the best-supported hypothesis. You can do this with a likelihood-based or bayesian approach.
posted by one_bean at 6:51 PM on December 8, 2013 [2 favorites]
posted by one_bean at 6:51 PM on December 8, 2013 [2 favorites]
Check out the various implementations of ensemble learning. To me, "p-values" are a tautological construct whose utility depends on implementation, not on the concept itself. These learning algorithms take a more agnostic approach to which is the correct underlying model than does more traditional analysis.
posted by deadweightloss at 7:13 PM on December 8, 2013
posted by deadweightloss at 7:13 PM on December 8, 2013
I would read Andrew Gelman's blog. Gelman is a Bayesian statistician whose work is very much not in the "p-values" vein; he writes a lot about his own research and that of others, and you should be able to get some good ideas about the wide range of work being carried out by people who see things as you do.
posted by escabeche at 7:37 PM on December 8, 2013 [3 favorites]
posted by escabeche at 7:37 PM on December 8, 2013 [3 favorites]
Serious question: what statistical modeling approaches are used in your subdiscipline, and especially in the journals where you publish or plan to publish? Because a great, innovative approach can become a bit of an albatross if editors have trouble finding reviewers for your manuscript, or if your colleagues have trouble interpreting (and citing) your results.
posted by Nomyte at 8:19 PM on December 8, 2013 [5 favorites]
posted by Nomyte at 8:19 PM on December 8, 2013 [5 favorites]
Well, here's a very informative if over-cute demonstration of the unreliability of p. (By Andrew Gelman, linked above.) Might be useful to you if you have to fight for alternative measures of confidence. It's not all hopeless, though. As the video demonstrates toward the end, confidence intervals (normally computed at the same time as p, and trivial to calculate in any case for the same sorts of analyses) actually tell one a fair bit. If you have two non-overlapping 95% confidence intervals for your control and experimental groups, you can say with reasonable certainty that your study would find a similar effect if replicated.
It's a higher standard of evidence – it's quite easy to have overlapping confidence intervals but still have p < 0.05 – but at least it's easy to include them in your publications without having to go all-Bayes-all-the-time (not that there's anything wrong with Bayesian analyses). A shocking number of publications (in my field, anyway – evolutionary ecology) don't publish confidence intervals, but that just means that the savvy researcher has to be a bit skeptical of such studies.
Things are getting better, as more and more scientists are waking up to the reality that p < 0.05 is a pretty low standard of evidence. I can't specifically point you to a field that habitually avoids them (although in phylogenetics, where most of my work is, Bayesian analyses are the tools of choice for much of the work) but it's getting more and more normal for scientists to be skeptical of a lone p-value. I suspect that as we see the next round of generational turnover in the sciences, it will become considerably less acceptable to base a study around nothing more than p < 0.05.
posted by Scientist at 8:47 PM on December 8, 2013 [3 favorites]
It's a higher standard of evidence – it's quite easy to have overlapping confidence intervals but still have p < 0.05 – but at least it's easy to include them in your publications without having to go all-Bayes-all-the-time (not that there's anything wrong with Bayesian analyses). A shocking number of publications (in my field, anyway – evolutionary ecology) don't publish confidence intervals, but that just means that the savvy researcher has to be a bit skeptical of such studies.
Things are getting better, as more and more scientists are waking up to the reality that p < 0.05 is a pretty low standard of evidence. I can't specifically point you to a field that habitually avoids them (although in phylogenetics, where most of my work is, Bayesian analyses are the tools of choice for much of the work) but it's getting more and more normal for scientists to be skeptical of a lone p-value. I suspect that as we see the next round of generational turnover in the sciences, it will become considerably less acceptable to base a study around nothing more than p < 0.05.
posted by Scientist at 8:47 PM on December 8, 2013 [3 favorites]
I would echo what Nomyte said above, as well. It's all well and good to have a shiny new analysis that gives a much better measure of confidence than p, but if your colleagues can't interpret it then you're going to have a hard time publishing a study based on that analysis. It's generally best not to stray too far from the accepted practice of your field, because if you get too far away from the norm you're going to spend a lot of time arguing with editors about the validity of your analyses. I realize that this is kind of crappy, but it also makes a sort of sense – science is only useful if others can understand it, and relying heavily on novel analyses can put a comprehension barrier between your work and your audience. That's why I mentioned confidence intervals; they allow one to continue using some of the more traditional analyses while still asserting a high degree of certainty that one's results are valid and replicable.
posted by Scientist at 8:52 PM on December 8, 2013 [1 favorite]
posted by Scientist at 8:52 PM on December 8, 2013 [1 favorite]
What exactly do you mean by "use p-values to define significance"? Because "significance" is a statistical convention, while p-values are the results of a specific statistical test, which is perfectly appropriate in lots of cases.
Your data and your hypothesis determine what test you need. If you are asking other people for alternatives to p-values, then you do not have a clear idea of what test you need. In that case you should brush up on probability theory and inference. Memail me if you want some resources for that.
posted by serif at 11:12 PM on December 8, 2013 [6 favorites]
Your data and your hypothesis determine what test you need. If you are asking other people for alternatives to p-values, then you do not have a clear idea of what test you need. In that case you should brush up on probability theory and inference. Memail me if you want some resources for that.
posted by serif at 11:12 PM on December 8, 2013 [6 favorites]
Like it or not, confidence intervals still rely on the notion that 95% of the time, given the other data parameters and/or assumptions, the point estimate lies between the bounds. It differs from a p-value only in whatever the conditions of the test are. You can calculate a CI for a difference of means, or you can look to see whether the difference differs significantly from zero, and it tells you the same thing and depends on the same model assumptions.
Bayesian stats is the only alternative I've encountered, and it's not easy to communicate to a lay audience.
posted by gingerest at 1:15 AM on December 9, 2013
Bayesian stats is the only alternative I've encountered, and it's not easy to communicate to a lay audience.
posted by gingerest at 1:15 AM on December 9, 2013
There's a strong tradition focusing on cross-validation and extrapolation error as the objective, especially when the amount of data is so large that model misspecification overwhelms random error. There are always the issues of "how much is notable?" versus "how much could you get if nothing was really happening?" You see/saw this more in machine learning and CS contexts.
posted by a robot made out of meat at 8:12 AM on December 9, 2013
posted by a robot made out of meat at 8:12 AM on December 9, 2013
Bayesian analysis. As it happens, it's what I do for a living so if you want to mail me I can help you get started. I would not recommend confidence intervals, as they are not particularly interpretable as anything you would ever want to know. P values as well - even if you don't use them for significance testing, they aren't interpretable as anything you'd want to know.
I am the author of the BayesFactor software which has a number of tests that correspond to ones you may know; references are given at the end of that page to articles that give the details (some more technical than others).
posted by Philosopher Dirtbike at 9:30 AM on December 9, 2013 [2 favorites]
I am the author of the BayesFactor software which has a number of tests that correspond to ones you may know; references are given at the end of that page to articles that give the details (some more technical than others).
posted by Philosopher Dirtbike at 9:30 AM on December 9, 2013 [2 favorites]
They still rely on a kind of p-value, but (because this was the philosophy of the person who trained me), I really love using resampling statistics. They're extremely intuitive and make their assumptions obvious. The idea is if you have multiple populations of data (ie values for experimental and controls), you can write a program to "resample" the data 10,000 times, by randomly assigning the values to the two groups. Then you ask where in the distribution of 10,000 the actual data falls, and the location on the curve becomes your probability.
posted by Buckt at 11:44 AM on December 10, 2013
posted by Buckt at 11:44 AM on December 10, 2013
This thread is closed to new comments.
Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy
Toward evidence-based medical statistics. 2: The Bayes factor
Bayesian Estimation Supersedes the t-Test
posted by un petit cadeau at 6:49 PM on December 8, 2013 [3 favorites]