I can teach this stuff, I swear I can...
March 23, 2010 1:16 PM Subscribe
Hope me understand what's going on on p.144 (footnote 5) of Tufte's Beautiful Evidence. Page 6 in this online excerpt.
One of my students will inevitably ask what this section is driving at. I don't want to look the fool by not being able to summarize and explain. I have a tentative understanding, but I need to be able to articulate the what's what.
One of my students will inevitably ask what this section is driving at. I don't want to look the fool by not being able to summarize and explain. I have a tentative understanding, but I need to be able to articulate the what's what.
Best answer: Leaving aside for a second the specifics of the t-tests in Tufte's example, think of it like this:
When you want to publish something unequivocal you may be tempted to remove results that are equivocal.
There are three types of kittens: white, black, gray. If you're studying, say, whether feeding pregnant cats milk produces more white or black kittens, you may have trouble accounting for gray kittens. They may muck up your ability to make definitive announcements for whether or not kittens are more likely to be black or white when pregnant cats are fed milk. So, you might call most gray cats white, or you might call most gray cats black, in a way that supports your position and increases your ability to be definitive.
Tufte is saying that with just one instance of this type of manipulation, it can be hard to see whether anything nefarious is going on. But, if a reviewer is able to reasonably say what the expected distributions of kitten types should be, aggregate data might show a hole in the number of gray kittens that were actually observed and recorded in the study data. In other words, the big picture shows manipulation of the data in a way that a single study might not because the results could be due to something else.
(Note: This example is a bit different in kind to Tufte's, but I think it demonstrates the same principle.)
posted by OmieWise at 1:31 PM on March 23, 2010 [1 favorite]
When you want to publish something unequivocal you may be tempted to remove results that are equivocal.
There are three types of kittens: white, black, gray. If you're studying, say, whether feeding pregnant cats milk produces more white or black kittens, you may have trouble accounting for gray kittens. They may muck up your ability to make definitive announcements for whether or not kittens are more likely to be black or white when pregnant cats are fed milk. So, you might call most gray cats white, or you might call most gray cats black, in a way that supports your position and increases your ability to be definitive.
Tufte is saying that with just one instance of this type of manipulation, it can be hard to see whether anything nefarious is going on. But, if a reviewer is able to reasonably say what the expected distributions of kitten types should be, aggregate data might show a hole in the number of gray kittens that were actually observed and recorded in the study data. In other words, the big picture shows manipulation of the data in a way that a single study might not because the results could be due to something else.
(Note: This example is a bit different in kind to Tufte's, but I think it demonstrates the same principle.)
posted by OmieWise at 1:31 PM on March 23, 2010 [1 favorite]
Although, on reflection, my example is a bit different in that it involved data manipulation rather than cherry picking. A closer analog would be only publishing data about litters that had a clear grouping of white or black kittens, and putting data on those litters that were either equally distributed, or had too many gray kittens, in the drawer.
posted by OmieWise at 1:38 PM on March 23, 2010
posted by OmieWise at 1:38 PM on March 23, 2010
Response by poster: thanks hivemind. i love you very very much.
posted by madred at 2:04 PM on March 23, 2010
posted by madred at 2:04 PM on March 23, 2010
Ben Goldacre talks about this phenomenon too in his book Bad Science, which you might want to check out. His recommendation is to have a central registry to record intended studies before they get started. Then we'd at least be able to get some idea of how often this happens, and there could be followup to see if the results were "not interesting enough" or what.
posted by teg at 2:33 PM on March 23, 2010
posted by teg at 2:33 PM on March 23, 2010
The graph shows the the significance level found by a whole lot of tests (248 of them) made by a bunch of studies (17 of them). You'd generally expect it to be bell-shaped - some tests would show very high correlations, some would show very low correlations, but there'd be many more in the middle. Instead the graph is shaped like a "w": it has lots of results at the extremes, lots in the center, very few mediocre results between the center and the extremes.
Tufte says that this is because mediocre results are boring. Researchers have an incentive to adjust mediocre results to make them more interesting, which accounts for the peaks at the extremes. The peaks are there because the researchers adjusted the mediocre results somehow - perhaps they omitted data that didn't support their hypothesis, or perhaps they exaggerated data that did support it. Looking at a single study wouldn't be enough to show this, but if you see a graph like this when you put a whole lot of studies together you can be sure that some of the studies must be tainted.
posted by Joe in Australia at 6:33 PM on March 23, 2010
Tufte says that this is because mediocre results are boring. Researchers have an incentive to adjust mediocre results to make them more interesting, which accounts for the peaks at the extremes. The peaks are there because the researchers adjusted the mediocre results somehow - perhaps they omitted data that didn't support their hypothesis, or perhaps they exaggerated data that did support it. Looking at a single study wouldn't be enough to show this, but if you see a graph like this when you put a whole lot of studies together you can be sure that some of the studies must be tainted.
posted by Joe in Australia at 6:33 PM on March 23, 2010
This thread is closed to new comments.
Not many people are writing studies where the conclusion is "meh," even if "meh" happens to be the truth. Therefore, there's cherry-picking of data going on.
posted by Cool Papa Bell at 1:25 PM on March 23, 2010