Column graph vs. line graph: What's better for yearly data?
February 10, 2020 10:22 AM   Subscribe

At my workplace, I'm getting ready to publish an internal annual report for 2019. I'm looking for advice on the most-effective way to present some data graphically.

The report will include about 30 graphs. They're all similar, in the sense that they show data grouped by full year. In the past, I've always used column graphs. But one of my co-workers keeps telling me that I should use line graphs, instead. (He also tells me that my graphs always look plain and boring).

Here are two graphs that show the same (fake) data in the two different formats. The top format is the one I've always favored.

My colleague claims that the line chart shows trends better. That might be true (maybe), but not all of the graphs show clear trends. Many of them depict numbers that bounce around with no obvious trend over time.

I'm looking for informed opinions on this matter, and I'm especially interested in references to articles, books, blog posts, etc. that discuss optimal ways to present this sort of data.
posted by akk2014 to Technology (11 answers total)
Personally I think line graphs work better for timeseries data and bar charts work better for categorical data, even if there's no clear trend. I don't think it's a hard and fast rule, though.

How are you making your figures? Even if you're using Excel, there's lots of ways to clean up the default options and remove "chart junk". Here's a site I just googled that has some Excel tips that follows modern ideas for good data visualization. If you're using a different tool let us know and I'm sure someone will be able to suggest some resources.
posted by no regrets, coyote at 10:34 AM on February 10, 2020 [3 favorites]

Response by poster: > How are you making your figures?

I use R to do all the analyses and graphs.
posted by akk2014 at 10:38 AM on February 10, 2020 [1 favorite]

It depends. The main deciding factors are your audience and the story you're trying to tell.

If the "trend" is an important part of your story and your audience is keen on it, your visual should reflect it.

IMO, if management is expecting a bar plot, you should deliver a bar plot unless you can support why the new format is superior.
posted by alrightokay at 10:47 AM on February 10, 2020 [2 favorites]

Personally, I'd always go with line graphs if the x-axis is actually a continuous, quantitative thing (like time), and column graphs if it's not. Usually I skip the line entirely and just plot points. But, I'm also almost always talking to scientists who have spent years looking at similar plots.

Since you asked for books, someone's bound to recommend Tufte, obliquely referenced already above. His books and philosophy are sometimes extreme, but it's also entertaining and not a bad introduction to thinking about such things carefully.
posted by eotvos at 10:55 AM on February 10, 2020 [3 favorites]

Here's a site I just googled that has some Excel tips that follows modern ideas for good data visualization.

Oh lord God don't use that site's advice; they wind up suggesting removing "chart junk" like gridlines which help people interpret values in the data and ultimately actually suggest the "curved lines" option in Excel, which creates the appearance of artificial trends that are not actually present in the data.

In general, line charts make it easier to see trends than bar charts, so I tend to agree with your coworker on that. However, if there are a substantial number of bars and they are relatively wide, as in your example, then trends to the degree they exist are still easy to spot. If there isn't a trend, then line charts can tend to imply more of a trend than actually exists; this is also true if the y-axis isn't well selected (eg if the total instances show a vague increase from 1000 to 1007 per year, but the y axis goes from 1000 to 1010, then it looks like a substantial increase rather than really no change at all).
posted by Homeboy Trouble at 10:59 AM on February 10, 2020 [1 favorite]

Another way to look at it is to ask what story you're trying to tell. If your data shows no correlation from one year to the next, time-series will communicate that, but it blunts and obscures the individual year values. Columns imply distinct data, which could be what you need to get across.

Another thing to shake up your approach - what if you didn't present chronologically, but y-axis descending. i.e. highlighting the biggest/smallest years?

I find this chart pretty helpful when I'm torn about charts.
posted by SoundInhabitant at 12:07 PM on February 10, 2020 [3 favorites]

The line graph please..

There are too many x-axis points for the bar graph to be visually comfortable. The first thing noticed in top graph was "wow so dense, many data points". The trend or actual value of the data is a bit difficult to focus on.

The line graph is cleaner and immediately draws my attention towards the trend and data values.
posted by TheLittlePrince at 12:24 PM on February 10, 2020 [1 favorite]

Best answer: An argument in favor of bars: a line graph on a time series is useful when the measurements are at instants in time or over a time period which is functionally instantaneous over the time-scale of the graph (e.g. monthly measurements over a half-century of data, or some other case where the number of data points is large enough that the space between them is graphically irrelevant). In contrast, each datum in your graph corresponds to a significant extent of time --- a full year in a graph which only shows 20 years. That "12,973" figure, for instance, isn't a measurement taken on January 1, 2002 (which is what the time axis suggests), but rather is representative of the entirety of 2002, and a bar with "2002" under it, instead of a point above the "2002" tickmark, reflects that reality more concretely.
posted by jackbishop at 12:25 PM on February 10, 2020 [6 favorites]

jackbishop's suggestion is sensible; however, I'd suggest removing the spaces between bars. To my eye, bars separated by spaces suggest categorically distinct data, while abutting bars suggest a time histogram, which is effectively what you have.
posted by biogeo at 6:56 AM on February 11, 2020 [1 favorite]

Response by poster: I've decided to continue doing the graphs with columns, as I've done in the past. jackbishop is a professional mathematician, and I'm encouraged by the fact that his reasoning is consistent with my own (decidedly non-professional) intuitions on the matter. I'm marking the question resolved. Thanks go out to everyone for your help.
posted by akk2014 at 7:08 PM on February 12, 2020

Can I recommend this site:
From Data to Viz
Which talks about different graph types and why they are or are not relevant to your data.

It also links to the very nice: The R Graph Gallery
Which can show you how to make every sort of graph in R.
posted by Just this guy, y'know at 5:53 AM on February 13, 2020

« Older birthday cake recipe, no sugar added   |   Collaborative genealogy websites Newer »
This thread is closed to new comments.