Al Gore memorabilia
October 12, 2011 10:33 PM Subscribe
Is the famous global warming "hockey stick" graph, the one Al Gore used in his movie, actually fatally flawed, and why is the "spaghetti" graph an acceptable replacement?
Complicated question, but Reddit's scientist panel was unable to answer it, and none of my friends can talk me through the whole thing.
As most people involved in this debate probably know, Stephen McIntyre did a statistical analysis on the MBH98 "hockey stick" graph and claimed a number of flaws. Two of these which were verified as real errors by a U.S. congressional report are:
(1) The method used to calculate MBH98 generates hockey sticks even with random data. Quoting Wikipedia: "The report stated that the MBH method creates a hockey-stick shape even when supplied with random input data (Figure 4.4), and argues that the MBH method uses weather station data from 1902 to 1995 as a basis for calibrating other input data."
(2) The data used as the basis of this method is scientifically flawed and in any case irreproducible. Wikipedia again: "The report said that MBH method creates a PC1 statistic dominated by bristlecone and foxtail pine tree ring series (closely related species). However there is evidence in the literature, that the use of the bristlecone pine series as a temperature proxy may not be valid (suppressing "warm period" in the hockey stick handle); and that bristlecones do exhibit CO2-fertilized growth over the last 150 years (enhancing warming in the hockey stick blade)."
In response to (1), the blog RealClimate, which is run by the "M" in MBH, posted an "unmanipulated" chart which still included the bad data from (2).
In response to (2), a paper was written by Wahl and Ammann recalculated the data, but still used the method of (1). On page 63 of their paper, you can see a number called r2, which calculates the trustworthiness of the model for making predictions; at points on the graph, r2 reaches 0.000.
Graduate student Linah Ababneh failed to reproduce the bristlecone series. Nevertheless, RealClimate says for some reason that "including these data improves the statistical validation over the 19th Century period and they therefore should be included."
When you combine (1) and (2), the graph no longer resembles a hockey stick at all. To their credit, both RealClimate and Wahl and Ammann have shown this graph in their work, but both of them say that their model is better, for a reason I can't understand. So, question number 1: I've explained this to the best of my knowledge, but I can't find what I'm missing. Is a method with r2 =0.000 really that awesome? I could probably draw random squiggly lines with better predictive ability than that.
My second question is a lot simpler: both RealClimate and Wikipedia have now phased out the hockey stick graph for a spaghetti graph. Wikipedia says, "It is unknown which, if any, of these reconstructions is an accurate representation of climate history". Well, that much should be obvious. Why is there such variation, and doesn't that produce an untrustworthy margin of error? Why are there graphs out there that look like this?
I don't mind if your answer is just a link to somewhere as long as it is discussing one of these two points.
Previous good general discussion here. (In case anyone is worried for my salvation, regardless of the answer to this question I do think anthropogenic climate change can have a huge effect on the Earth, but I believe that it could be negative or positive depending on how people go about it.)
Complicated question, but Reddit's scientist panel was unable to answer it, and none of my friends can talk me through the whole thing.
As most people involved in this debate probably know, Stephen McIntyre did a statistical analysis on the MBH98 "hockey stick" graph and claimed a number of flaws. Two of these which were verified as real errors by a U.S. congressional report are:
(1) The method used to calculate MBH98 generates hockey sticks even with random data. Quoting Wikipedia: "The report stated that the MBH method creates a hockey-stick shape even when supplied with random input data (Figure 4.4), and argues that the MBH method uses weather station data from 1902 to 1995 as a basis for calibrating other input data."
(2) The data used as the basis of this method is scientifically flawed and in any case irreproducible. Wikipedia again: "The report said that MBH method creates a PC1 statistic dominated by bristlecone and foxtail pine tree ring series (closely related species). However there is evidence in the literature, that the use of the bristlecone pine series as a temperature proxy may not be valid (suppressing "warm period" in the hockey stick handle); and that bristlecones do exhibit CO2-fertilized growth over the last 150 years (enhancing warming in the hockey stick blade)."
In response to (1), the blog RealClimate, which is run by the "M" in MBH, posted an "unmanipulated" chart which still included the bad data from (2).
In response to (2), a paper was written by Wahl and Ammann recalculated the data, but still used the method of (1). On page 63 of their paper, you can see a number called r2, which calculates the trustworthiness of the model for making predictions; at points on the graph, r2 reaches 0.000.
Graduate student Linah Ababneh failed to reproduce the bristlecone series. Nevertheless, RealClimate says for some reason that "including these data improves the statistical validation over the 19th Century period and they therefore should be included."
When you combine (1) and (2), the graph no longer resembles a hockey stick at all. To their credit, both RealClimate and Wahl and Ammann have shown this graph in their work, but both of them say that their model is better, for a reason I can't understand. So, question number 1: I've explained this to the best of my knowledge, but I can't find what I'm missing. Is a method with r2 =0.000 really that awesome? I could probably draw random squiggly lines with better predictive ability than that.
My second question is a lot simpler: both RealClimate and Wikipedia have now phased out the hockey stick graph for a spaghetti graph. Wikipedia says, "It is unknown which, if any, of these reconstructions is an accurate representation of climate history". Well, that much should be obvious. Why is there such variation, and doesn't that produce an untrustworthy margin of error? Why are there graphs out there that look like this?
I don't mind if your answer is just a link to somewhere as long as it is discussing one of these two points.
Previous good general discussion here. (In case anyone is worried for my salvation, regardless of the answer to this question I do think anthropogenic climate change can have a huge effect on the Earth, but I believe that it could be negative or positive depending on how people go about it.)
Best answer: There are detailed responses to criticisms of the MBH98 study in the very same Wikipedia article you linked. Perhaps you could explain why you find these insufficient. One should be careful not to treat the criticisms by Wegman and others as somehow being beyond critique or response themselves.
A detailed discussion of why r2 is not the best statistic to use is in that very same paper, as has been noted. I study genomics and I hardly ever use r2 because it has various unhelpful biases. It's pretty accepted in many areas of science to use metrics other than r2 to measure model fit.
The method used to calculate MBH98 generates hockey sticks even with random data.
Mann's response to this:
RealClimate says for some reason that "including these data improves the statistical validation over the 19th Century period and they therefore should be included."
When you're making predictions with a model, ideally you want to be able to validate the predictions against ground truth (for example, temperature measurements. If your model can reproduce what you already know, it is more likely that it is getting things right where you don't have a ground truth.
Mann et al. do not agree with the criticisms of this particular tree-ring data, and they say that without them, the model does a poorer job of reproducing the ground truth. That is a sound reason for including it.
posted by grouse at 12:31 AM on October 13, 2011
A detailed discussion of why r2 is not the best statistic to use is in that very same paper, as has been noted. I study genomics and I hardly ever use r2 because it has various unhelpful biases. It's pretty accepted in many areas of science to use metrics other than r2 to measure model fit.
The method used to calculate MBH98 generates hockey sticks even with random data.
Mann's response to this:
Given a large enough “fishing expedition” analysis, it is of course possible to find “Hockey-Stick like” PC series out of red noise. But this is a meaningless exercise. Given a large enough number of analyses, one can of course produce a series that is arbitrarily close to just about any chosen reference series via application of PCA to random red noise. The more meaningful statistical question, however is this one: Given the “null hypothesis” of red noise with the same statistical attributes… how likely is one to produce the “Hockey Stick” pattern from chance alone.He goes on to say that given random data, the production of a "Hockey Stick" is sufficiently rare that getting a "Hockey Stick" from the real data is statistically significant. This idea is basic frequentist statistics.
RealClimate says for some reason that "including these data improves the statistical validation over the 19th Century period and they therefore should be included."
When you're making predictions with a model, ideally you want to be able to validate the predictions against ground truth (for example, temperature measurements. If your model can reproduce what you already know, it is more likely that it is getting things right where you don't have a ground truth.
Mann et al. do not agree with the criticisms of this particular tree-ring data, and they say that without them, the model does a poorer job of reproducing the ground truth. That is a sound reason for including it.
posted by grouse at 12:31 AM on October 13, 2011
Response by poster: Okay, so (1) is disputed on the grounds that the "random" data was not really random, and truly random data would show that the predictive model is good. This appears to be under debate as of November 2010.
Also that there are better metrics than r2, which I will readily accept, since I'm not a scientist and never had to deal with statistics in this way.
You also say that (2) is disputed, but I'm not sure on what grounds. (Sorry, am I not looking in the right spot on Wikipedia)?
Thank you for your replies.
posted by shii at 1:21 AM on October 13, 2011
Also that there are better metrics than r2, which I will readily accept, since I'm not a scientist and never had to deal with statistics in this way.
You also say that (2) is disputed, but I'm not sure on what grounds. (Sorry, am I not looking in the right spot on Wikipedia)?
Thank you for your replies.
posted by shii at 1:21 AM on October 13, 2011
Response by poster: While continuing to look at this myself, I found a quick scientific overview in WIREs Climate Change 1.4 (2010), which describes the hockey stick graph, without its disclaimers, as portraying "an overly optimistic assessment of the degree to which past temperatures were understood."
The authors agree with me on the spaghetti graph as well: "the high dispersion of the records away from the calibration period mean casts doubt on the absolute amplitude of past temperature change."
This doesn't solve the hard-science problem, but I'm glad to know generally that my concerns are not out in the loonysphere. Maybe that's all I was looking for anyway.
posted by shii at 1:52 AM on October 13, 2011
The authors agree with me on the spaghetti graph as well: "the high dispersion of the records away from the calibration period mean casts doubt on the absolute amplitude of past temperature change."
This doesn't solve the hard-science problem, but I'm glad to know generally that my concerns are not out in the loonysphere. Maybe that's all I was looking for anyway.
posted by shii at 1:52 AM on October 13, 2011
I have always wondered why the hockey stick ends in 2004. That's almost six years out of date. Is it still trending up? Are there updates to this graph?
posted by rebent at 5:45 AM on October 13, 2011
posted by rebent at 5:45 AM on October 13, 2011
Response by poster: To answer rebent's sub-question, after 2001 the IPCC moved away from having a single multiproxy graph, and towards putting together several multiproxy studies on one graph that shows internal variations, the "spaghetti graph".
posted by shii at 5:23 PM on October 13, 2011
posted by shii at 5:23 PM on October 13, 2011
This thread is closed to new comments.
I suggest reading the rest of the paper, specifically the couple of pages before and after the table on page 63 that you referenced. They have a cogent argument about why r^2 is not the correct statistic to use in this situation, namely that interannual variability (something the correlation coefficient is good at measuring) is less interesting than changes in mean state (something it is not, but RE is, which is what they use in the paper rather than the appendix).
Further, since you've been to the Wikipedia page, I suggest you read to the bottom. There are two papers cited (Kaufman et al. 2009 and Tingley and Huybers 2010) there that use independent data and essentially confirm the main point of the hockey stick: the last decade was an anomalously warm decade in an anomalously warm century.
I don't quite know what your second question is asking, but basically climate science is not an experimental science, which is to say the way to reject hypotheses is not quite the same as described by, say, Popper. When you're using advanced statistical models, you're going to have to rely on a preponderance of evidence to weigh your options. The preponderance of evidence, and theory, is that climate change is real and it is not good.
posted by one_bean at 11:28 PM on October 12, 2011 [1 favorite]