Weather, damn cold weather, and statistics
January 24, 2013 5:56 AM   Subscribe

I'd like to estimate the number of days a year when the high temperature is likely to be below a particular threshold, e.g. below freezing. This turns out to be harder than expected.

I live in Pittsburgh, and found some NWS local climate data showing the normal high and low temperatures, normal average temperatures and record temperatures for each day. Even in the coldest parts of of the year (right now), Pittsburgh's "normal high" is never below 35F, but there are always many days a year when it's colder than that (currently 14F). I imagine that you could plot a sort of bell curve for each day between the record extremes and the normal temperatures, and estimate the odds of hitting a particular temperature by seeing where it falls on that curve. But I never took a statistics class, so I'm not sure how to do this. Suggestions? Other approaches?
posted by jon1270 to Science & Nature (8 answers total) 2 users marked this as a favorite
 
Because of climate change, the past record lows are going to be more extreme than we would predict for the future. I'm not saying its not possible, because weather year to year is highly variable. But climate change is taking effect and causing slightly warmer winters in the mid-Atlantic. So however you do this, you probably don't want to go more than 10 years back or your error will be too high.

Were you able to find a site that gives actual past temperatures? I seem to recall one, but don't remember what it was - I'll take a look when I'm on a computer.
posted by DoubleLune at 6:09 AM on January 24, 2013


Best answer: The averages tab on WeatherSpark aggregates a lot of historical data into easy to read charts.
posted by Uncle Jimmy at 6:26 AM on January 24, 2013 [2 favorites]


Because of climate change, the past record lows are going to be more extreme than we would predict for the future. I'm not saying its not possible, because weather year to year is highly variable. But climate change is taking effect and causing slightly warmer winters in the mid-Atlantic. So however you do this, you probably don't want to go more than 10 years back or your error will be too high.

I suspect the error term in what ever sort of calculations OP is doing is going to be greater than the impact of climate change which in degree terms is really quite small even if the impact is gigantic.

It wouldn't be as simple as assuming temps are normally distributed, because temps are going to be correlated day to day. When its 14 degrees today it is much much more likely to be 14 degrees tomorrow than it would be if it were 35 degrees today.
posted by JPD at 7:30 AM on January 24, 2013


Best answer: There's a couple of ways you can do this. The way you are thinking of requires one more ingredient: variability of a day's temperature from year-to-year. With the mean and variance you could then estimate the probability of exceeding certain temperature thresholds. Unfortunately, variance is not usually easy to find.

The other approach is easy, but tedious: download the data and plot it out. You can find historical weather data at the National Climatic Data Center. If you go here NCDC gives you a couple of ways to search for data. I choose the first one and entered Pittsburgh. For data set choose "Daily GHCND" (Daily Global Historical Climate Network Data). That will bring up a list of stations with data in the Pittsburgh area. Pittsburgh International Airport looks like your best bet for a reliable, semi-long-term (since 1948) record. Add that to your cart and then go to the cart. There's a box on the right asking for the output date range and the data format. Might as well choose all the data as a CSV file. That will take you to another screen where you can choose what variables (temperature, precipitation, sunshine, etc.) you want. Continue on to enter your email address. Wait a few minutes and whatever you ordered will be in your inbox.

If you choose the whole record of daily high/low temperatures you'll get a ~2.7 MB csv file. Now the tedious part begins. The format of the file is that each day is a row and the variables are columns. You want to get this into a nicer format so that the data is easier to use. How long that takes depends on your programming and spreadsheet manipulation skills. I'm kind of slow in these regards, but cutting, pasting and transposing in Excel will eventually get you years as rows, columns as days and the cells being the variable (Tmax or Tmin). From there it is very easy to plot a scatter diagram of the range temperatures since 1948 by day throughout the year. It would also then be trivial to calculate all sorts of probabilities.

Caveats: There are lots of caveats. If you do this, the first thing to notice is that the data is stored as tenths of a degree Celsius, so don't be shocked that January 1st 1948 warmed up to 106 degrees. The high was actually 10.6 C or 51 Fahrenheit.

This sort of analysis is pretty crude. It's fine for your own edification, but not valid for any serious science work as the data isn't necessarily homogeneous. Over the years the observation location may have shifted, different types of instruments were used, there might have been a change in observation time, the urban heat island may have affected the long-term trend, and, as others have mentioned, with global warming we're dealing with a changing mean (and possibly a changing variance). All those changes may have a non-negligible affect on the temperature record and would need to be taken into account.
posted by plastic_animals at 8:07 AM on January 24, 2013


It wouldn't be as simple as assuming temps are normally distributed, because temps are going to be correlated day to day. When its 14 degrees today it is much much more likely to be 14 degrees tomorrow than it would be if it were 35 degrees today.

Yes, but that actually won't be a problem because the OP is only trying to calculate an expectation.
posted by madcaptenor at 10:01 AM on January 24, 2013


Response by poster: Thanks to everyone, especially for plastic_animals' NDCD llink. The data I found earlier in the NWS site was in PDF format, one month to a page, and was a pain to get into a spreadsheet.

The crudeness of these methods is just fine for my purposes; I'm just looking for ballpark numbers. If climate change were happening fast enough to spoil this little project, we'd all be in really deep doodoo. (I mean, we may be in deep doodoo, but it would be even worse).
posted by jon1270 at 1:44 PM on January 24, 2013


Look at page 6 (it shows winter temp increase in NE). I guess you're looking for a cruder estimate than I realized, but if you look at the data, it's pretty well shown that there's been a couple degree change (between 2 and 4 degrees depending on what trend is being looked at). If you compare this data to the average temperature ones on WeatherSpark, it takes away ~14-30 days of the low temperature mean possibility being below freezing, assuming that they're data goes back ~50 years.
posted by DoubleLune at 6:07 PM on January 24, 2013


Response by poster: ... it takes away ~14-30 days of the low temperature mean possibility being below freezing, assuming that they're data goes back ~50 years.

Okay, that's worth adjusting for, if only by using a data set that doesn't go back quite so far. Thanks for pointing that out.
posted by jon1270 at 3:03 AM on January 25, 2013


« Older OMG, storytelling in journalism is changing. What...   |   How do I become a person who shows up to work on... Newer »
This thread is closed to new comments.