# Looking for the best basketball statistics resources

April 29, 2014 1:35 PM Subscribe

I'm a scientist who deals with statistics all day, and my partner is a die-hard basketball fan. He follows some of the basketball stats nerds, and sometimes he wants to talk about basketball stats with me, but I can never find any decent statistical summaries - just a bunch of averages with no context provided. Are there any free resources that provide sports statistics with more than simple averages?

My problem is that now I know how to interpret statistics, I find that sports stats virtually never come packaged with the information I'd need to know how to interpret them correctly. Sample size is elided, underlying distribution of the data is never addressed, and worst, they never even provide any metrics of variability. The very simplest thing I'm looking for is standard deviations reported alongside all of those means. Ranges, medians, and/or quartiles would be neat too.

I'd really like to avoid calculating everything myself if possible, but I'd consider it if the data were easily available. The calculations are easy, but wrangling all that data is a hassle. I see basketballreference.com lets you download some .csv files, and I can do that if there are no other options, but I thought that surely someone would have had the good sense to include basic variability metrics with their averages.

Along the same lines, I'd love any recommendations you have for resources (forums, blogs, columnists, etc) that do a good job talking about statistics in sports - there have to be some because all of my stats professors have been huge sports nerds, surely they're all huddled together talking about sports stats somewhere on the internet.

My problem is that now I know how to interpret statistics, I find that sports stats virtually never come packaged with the information I'd need to know how to interpret them correctly. Sample size is elided, underlying distribution of the data is never addressed, and worst, they never even provide any metrics of variability. The very simplest thing I'm looking for is standard deviations reported alongside all of those means. Ranges, medians, and/or quartiles would be neat too.

I'd really like to avoid calculating everything myself if possible, but I'd consider it if the data were easily available. The calculations are easy, but wrangling all that data is a hassle. I see basketballreference.com lets you download some .csv files, and I can do that if there are no other options, but I thought that surely someone would have had the good sense to include basic variability metrics with their averages.

Along the same lines, I'd love any recommendations you have for resources (forums, blogs, columnists, etc) that do a good job talking about statistics in sports - there have to be some because all of my stats professors have been huge sports nerds, surely they're all huddled together talking about sports stats somewhere on the internet.

Annoyingly, I can't link directly to the search results, but plug "NBA" into the MIT Sloan Sports Conference's website and you'll find a lot of basketball academic nerdery.

posted by Ufez Jones at 1:46 PM on April 29, 2014

posted by Ufez Jones at 1:46 PM on April 29, 2014

I'm not sure what you're asking for exactly. Basketball-reference has each individual game log for almost all players, so you can definitely calculate sample size and variability from their data on a game by game basis. If you're looking for shot by shot data or something more granular than game, you're not going to get that without paying Stats Inc. money.

If you're asking whether someone has already done all the averages and that also includes sample sizes and standard deviations, then no, I don't think there's a site that does that for you.

posted by thewumpusisdead at 1:47 PM on April 29, 2014

If you're asking whether someone has already done all the averages and that also includes sample sizes and standard deviations, then no, I don't think there's a site that does that for you.

posted by thewumpusisdead at 1:47 PM on April 29, 2014

Here's Basketball Reference.com

Lord help you, Husbunny is a stats freak for the WNBA, the difference is, he's the guy putting it all together! You should see the crap at our place!

posted by Ruthless Bunny at 1:50 PM on April 29, 2014

Lord help you, Husbunny is a stats freak for the WNBA, the difference is, he's the guy putting it all together! You should see the crap at our place!

posted by Ruthless Bunny at 1:50 PM on April 29, 2014

I always like Zach Lowe's writing at Grantland for a good statistical look at basketball, with some decent dissections of play thrown in for good measure.

posted by zempf at 1:53 PM on April 29, 2014

posted by zempf at 1:53 PM on April 29, 2014

Also, there are pretty cool shot charts and heat maps for everyone at basketball reference that I have to plug because I work here (and if you have any questions about the site, tweet at @bball_ref & he'll definitely respond).

posted by zempf at 1:56 PM on April 29, 2014

posted by zempf at 1:56 PM on April 29, 2014

Thanks everyone, this is all great and keep it coming. I really love the heat maps - I do a lot of work in GIS and with spatial statistics so that's especially cool to see.

Just for example, people talk about a player averaging x points per game, but they never mention the extent to which that distribution is skewed. Some players are streaky and score a ton of points in relatively few games, while other players are steadier and score near their average most games. Some players are just all over the map - their score per game ranges from 0 all the way to the high end of the range. That's the sort of information I want that never seems to be reported with the regular statistics people use.

More broadly, I just get frustrated with the misapplication of statistics in sports analysis and I'm looking for resources that get it right more often. The home court advantage question is a good example: I see analysts talk about applying some leaguewide home court advantage in terms of point spread or win probability, like the home team wins 60% of the time or gets 3.25 more points on average, but they never address the fact that the variability of that statistic among teams is huge so that number is essentially meaningless. Some teams have gigantic home court advantages, other teams basically get no advantage from it. That wide variability makes it basically inappropriate to apply that population-level statistic back to help predict or speak to any individual team's outcomes, much less an individual game's outcome, which I still see people doing all the time (i.e "oh they have the home court advantage, and the home team wins 60% of the time": it's implied that this is somehow applicable to the outcome of the particular game they're talking about, which it isn't at all necessarily). That's just one example of how variability would be important to think about instead of just talking about means all the time.

posted by dialetheia at 2:28 PM on April 29, 2014

*I'm not sure what you're asking for exactly.*Just for example, people talk about a player averaging x points per game, but they never mention the extent to which that distribution is skewed. Some players are streaky and score a ton of points in relatively few games, while other players are steadier and score near their average most games. Some players are just all over the map - their score per game ranges from 0 all the way to the high end of the range. That's the sort of information I want that never seems to be reported with the regular statistics people use.

More broadly, I just get frustrated with the misapplication of statistics in sports analysis and I'm looking for resources that get it right more often. The home court advantage question is a good example: I see analysts talk about applying some leaguewide home court advantage in terms of point spread or win probability, like the home team wins 60% of the time or gets 3.25 more points on average, but they never address the fact that the variability of that statistic among teams is huge so that number is essentially meaningless. Some teams have gigantic home court advantages, other teams basically get no advantage from it. That wide variability makes it basically inappropriate to apply that population-level statistic back to help predict or speak to any individual team's outcomes, much less an individual game's outcome, which I still see people doing all the time (i.e "oh they have the home court advantage, and the home team wins 60% of the time": it's implied that this is somehow applicable to the outcome of the particular game they're talking about, which it isn't at all necessarily). That's just one example of how variability would be important to think about instead of just talking about means all the time.

posted by dialetheia at 2:28 PM on April 29, 2014

*people talk about a player averaging x points per game, but they never mention the extent to which that distribution is skewed. Some players are streaky and score a ton of points in relatively few games, while other players are steadier and score near their average most games. Some players are just all over the map - their score per game ranges from 0 all the way to the high end of the range.*

The skew in point distribution is usually best explained by dependent factors, or by digging deeper into the types of shots that are generating the points.

1) PPG will correlate highly with minutes per game (MPG). If a player is injured or in foul trouble, he'll play fewer minutes and score fewer points. If he is a backup and the starter is injured or in foul trouble, he'll play more minutes and score more points.

2) Defensive schemes impact opportunities. Player A will have defenses set up to stop him specifically when he is the best player on the floor. When Player A is the fourth best player on the floor on a different team or in a different game, defenses will allow him more opportunity and space to score in order to deny opportunity to more threatening players.

3) Offensive schemes impact opportunities. Some coaches will play offenses that are heavy on plays for a few players (isolation or pick and roll, let's say), leading to increased opportunities for those players. Some coaches will play a motion offense where the opportunities are distributed more equally. This would have more of a long-term effect because a player's PPG would evolve differently over whole seasons with different offensive schemes in place, holding other variables constant. But it would have a small effect from game-to-game also-NBA coaches are always making small adjustments.

4) Teammates impact opportunities. If a player is playing with an excellent passer, he'll get more easy opportunities to score and score more points. If he is playing with a selfish player who shoots a lot, he'll get fewer opportunities.

As I mentioned, the types of shot that a player takes will also have an effect on PPG variability, types of shots that go in with a higher percentage will be less variable from game to game. So the steadier players will generate more of their points from closer shots and from free throws, the more variable players will take longer shots, especially a lot of three pointers.

Once you control for all of this, I think that will explain a lot of the skew (If you do and it doesn't, you've probably created an interesting piece of research!). So I guess my answer is that PPG is kind of a crude instrument and nobody bothers to list standard deviation and the like because no one is doing serious analysis with it, because it gets nudged around by so many other factors. You'll often find with sports stats a big difference between the popular statistics and the ones that serious analysts are using to evaluate and predict. Sporting analysis wasn't taken seriously until relatively recently and the stats that are popular tend to be the ones that are easy to count or track and so are somewhat haphazard from an analytical perspective.

I'm not sure about the home court advantage, at least in NBA basketball. I don't actually think that some teams have a relatively large home court advantage relative to other teams-I'd be interested to see otherwise, I haven't kept up with this research recently.

I was really into this stuff about a half decade ago but a lot of the people that I was reading at that time have been hired by NBA teams and are no longer publishing their work to the public. Zach Lowe is great, but he's more of a writer than a numbers person. I second the MIT Sloan Sports Conference stuff, but I haven't been too impressed with the sports coverage at the rebooted 538 so far.

posted by Kwine at 4:30 PM on April 29, 2014

The NBA unveiled stats.nba.com a few months ago. I couldn't get it to do anything neat when I toyed with it, but that was right after they put it out, so it may have been having issues. You might want to poke around there and see what options they have.

posted by cashman at 5:20 PM on April 29, 2014

posted by cashman at 5:20 PM on April 29, 2014

Kirk Goldsberry is a professor with a PhD in geography who works with GIS, spatial analysis, and mapping. He is right now a staff writer at Grantland and does a fair amount of sports statistics, especially spatial sports statistics.

posted by ArgyleGargoyle at 7:07 PM on April 29, 2014

posted by ArgyleGargoyle at 7:07 PM on April 29, 2014

APBRMetrics is one of the original (only?) statistics oriented basketball forums and is still fairly active. Before stats driven analysis became more mainstream, this is where a lot of people who would become prominent in this area discussed things. I believe it was founded by Kevin Pelton, who now writes statistically driven pieces for ESPN. People like John Hollinger (former ESPN journalist who did a lot of statistical work and is now a Memphis Grizzlies VP), Neil Paine (former basketball-reference blogger who now covers sports for FiveThirtyEight), Justin Kubatko (creator of basketball-reference and former consultant to the Portland Trailblazers) and a lot of other influential people in the basketball stats community have participated in discussions there.

posted by Percolate at 1:16 AM on April 30, 2014

posted by Percolate at 1:16 AM on April 30, 2014

« Older What's the car buying process in DC? | Is it worth it to confront your flaky friends... Newer »

This thread is closed to new comments.

posted by peacheater at 1:40 PM on April 29, 2014