Join 3,418 readers in helping fund MetaFilter (Hide)


Weather, damn cold weather, and statistics
January 24, 2013 5:56 AM   Subscribe

I'd like to estimate the number of days a year when the high temperature is likely to be below a particular threshold, e.g. below freezing. This turns out to be harder than expected.

I live in Pittsburgh, and found some NWS local climate data showing the normal high and low temperatures, normal average temperatures and record temperatures for each day. Even in the coldest parts of of the year (right now), Pittsburgh's "normal high" is never below 35F, but there are always many days a year when it's colder than that (currently 14F). I imagine that you could plot a sort of bell curve for each day between the record extremes and the normal temperatures, and estimate the odds of hitting a particular temperature by seeing where it falls on that curve. But I never took a statistics class, so I'm not sure how to do this. Suggestions? Other approaches?
posted by jon1270 to Science & Nature (9 answers total) 2 users marked this as a favorite
 
Because of climate change, the past record lows are going to be more extreme than we would predict for the future. I'm not saying its not possible, because weather year to year is highly variable. But climate change is taking effect and causing slightly warmer winters in the mid-Atlantic. So however you do this, you probably don't want to go more than 10 years back or your error will be too high.

Were you able to find a site that gives actual past temperatures? I seem to recall one, but don't remember what it was - I'll take a look when I'm on a computer.
posted by DoubleLune at 6:09 AM on January 24, 2013


The averages tab on WeatherSpark aggregates a lot of historical data into easy to read charts.
posted by Uncle Jimmy at 6:26 AM on January 24, 2013 [2 favorites]


Because of climate change, the past record lows are going to be more extreme than we would predict for the future. I'm not saying its not possible, because weather year to year is highly variable. But climate change is taking effect and causing slightly warmer winters in the mid-Atlantic. So however you do this, you probably don't want to go more than 10 years back or your error will be too high.

I suspect the error term in what ever sort of calculations OP is doing is going to be greater than the impact of climate change which in degree terms is really quite small even if the impact is gigantic.

It wouldn't be as simple as assuming temps are normally distributed, because temps are going to be correlated day to day. When its 14 degrees today it is much much more likely to be 14 degrees tomorrow than it would be if it were 35 degrees today.
posted by JPD at 7:30 AM on January 24, 2013


There's a couple of ways you can do this. The way you are thinking of requires one more ingredient: variability of a day's temperature from year-to-year. With the mean and variance you could then estimate the probability of exceeding certain temperature thresholds. Unfortunately, variance is not usually easy to find.

The other approach is easy, but tedious: download the data and plot it out. You can find historical weather data at the National Climatic Data Center. If you go here NCDC gives you a couple of ways to search for data. I choose the first one and entered Pittsburgh. For data set choose "Daily GHCND" (Daily Global Historical Climate Network Data). That will bring up a list of stations with data in the Pittsburgh area. Pittsburgh International Airport looks like your best bet for a reliable, semi-long-term (since 1948) record. Add that to your cart and then go to the cart. There's a box on the right asking for the output date range and the data format. Might as well choose all the data as a CSV file. That will take you to another screen where you can choose what variables (temperature, precipitation, sunshine, etc.) you want. Continue on to enter your email address. Wait a few minutes and whatever you ordered will be in your inbox.

If you choose the whole record of daily high/low temperatures you'll get a ~2.7 MB csv file. Now the tedious part begins. The format of the file is that each day is a row and the variables are columns. You want to get this into a nicer format so that the data is easier to use. How long that takes depends on your programming and spreadsheet manipulation skills. I'm kind of slow in these regards, but cutting, pasting and transposing in Excel will eventually get you years as rows, columns as days and the cells being the variable (Tmax or Tmin). From there it is very easy to plot a scatter diagram of the range temperatures since 1948 by day throughout the year. It would also then be trivial to calculate all sorts of probabilities.

Caveats: There are lots of caveats. If you do this, the first thing to notice is that the data is stored as tenths of a degree Celsius, so don't be shocked that January 1st 1948 warmed up to 106 degrees. The high was actually 10.6 C or 51 Fahrenheit.

This sort of analysis is pretty crude. It's fine for your own edification, but not valid for any serious science work as the data isn't necessarily homogeneous. Over the years the observation location may have shifted, different types of instruments were used, there might have been a change in observation time, the urban heat island may have affected the long-term trend, and, as others have mentioned, with global warming we're dealing with a changing mean (and possibly a changing variance). All those changes may have a non-negligible affect on the temperature record and would need to be taken into account.
posted by plastic_animals at 8:07 AM on January 24, 2013


It wouldn't be as simple as assuming temps are normally distributed, because temps are going to be correlated day to day. When its 14 degrees today it is much much more likely to be 14 degrees tomorrow than it would be if it were 35 degrees today.

Yes, but that actually won't be a problem because the OP is only trying to calculate an expectation.
posted by madcaptenor at 10:01 AM on January 24, 2013


If actual data for the last 5 years at Pittsburgh is good enough, here is a super easy way to get what you need:

Go to the NWS Pittsburgh office climate page.

Now, under "product" select Preliminary Monthly Climate Data. Select location (Pittsburgh in this case). Then under "Time Frame" select "archived data" and then pick the month and year you want the data for. And then press "Go". A report will pop up with everything you ever wanted to know about the weather at Pittsburgh for that month. If you scroll down the report just a bit you will see the "Number of days with" section. As an example, if I select January 2008, I see that Pittsburgh had 11 days that month with a maximum temperature of 32 or less. You could quickly do that for all of the cool season months over the past 5 years to get an exact 5 year average by month.

Also, if you have a broader interest in this type of climate data, the Northeast Regional Climate Center has lots of tabulated data for all the major US stations (whole country, not just the northeast). They don't tabulate days with max less than 32, but they do provide quick easy to read tables with days per month for other temperature thresholds, like days above 90 or days with a minimum temperature less than 32.

Have fun!
posted by Seymour Zamboni at 11:55 AM on January 24, 2013


Thanks to everyone, especially for plastic_animals' NDCD llink. The data I found earlier in the NWS site was in PDF format, one month to a page, and was a pain to get into a spreadsheet.

The crudeness of these methods is just fine for my purposes; I'm just looking for ballpark numbers. If climate change were happening fast enough to spoil this little project, we'd all be in really deep doodoo. (I mean, we may be in deep doodoo, but it would be even worse).
posted by jon1270 at 1:44 PM on January 24, 2013


Look at page 6 (it shows winter temp increase in NE). I guess you're looking for a cruder estimate than I realized, but if you look at the data, it's pretty well shown that there's been a couple degree change (between 2 and 4 degrees depending on what trend is being looked at). If you compare this data to the average temperature ones on WeatherSpark, it takes away ~14-30 days of the low temperature mean possibility being below freezing, assuming that they're data goes back ~50 years.
posted by DoubleLune at 6:07 PM on January 24, 2013


... it takes away ~14-30 days of the low temperature mean possibility being below freezing, assuming that they're data goes back ~50 years.

Okay, that's worth adjusting for, if only by using a data set that doesn't go back quite so far. Thanks for pointing that out.
posted by jon1270 at 3:03 AM on January 25, 2013


« Older Here's an European writing a b...   |  I've worked professionally for... Newer »
This thread is closed to new comments.