Join 3,553 readers in helping fund MetaFilter (Hide)


Please help me improve this graph!
June 24, 2011 3:54 PM   Subscribe

What are accepted ways for visualizing this example dataset for a journal or other "professional" endpoint?

The graph I have in mind looks like this, roughly speaking:

• I have two datasets that I am comparing with two separate y-axes. For the purposes of illustration, higher values of variables 1 and 2 are "bad", lower values are "good".

• The first dataset (green) is showing data results for variables 1 and 2 across all categories (x-axis, A to N).

• The second dataset (red) is showing data results for variables 1 and 2 across a subset of categories (A to K).

• The categories are continuous and equally distanced values, which is why lines are used instead of discrete points.

What I would like to do:

I want to show that the second dataset (red) not only performs worse than the first dataset (green) after the first category, but that it cuts out entirely past category K.

What I am doing now:

That red ball is supposed to be a visual indicator that would be explained in the figure legend. The ball could be replaced with a box or cross.

My question:

What are other ways in which this kind of data comparison can be visualized, which follow accepted conventions for data visualization in modern scientific journals?

I would like to avoid the "cheat" of using the red ball, since I would have to explain this in writing in the figure legend, and I don't want to confuse the inferences that the audience should derive from the figure.

If there is something I can put in which other authors use as a matter of course, I would like to use that for less confusion. For example, should I draw a vertical line at category K, to highlight that cutoff point?

I am thumbing through Tufte's books for inspiration, so there's no need to point me there unless there's a specific book and page number that you think meets the criteria for a demonstrative example.

Please note: I would greatly prefer specifics and pointers to figures in journal articles or textbooks, etc. as examples, as opposed to generalities. (I am not looking for philosophical approaches to visualization, so much as concrete examples of this problem being solved elsewhere.)

In particular, if you have examples which communicate the same dimensions of data results which are not line graphs, but are a different design that is clearer, more efficient and more elegant, I would be grateful for that kind of inspiration.
posted by Blazecock Pileon to Science & Nature (6 answers total) 3 users marked this as a favorite
 
The vertical line at K to highlight the cutoff point would be a good idea. That would be preferable to the ball, which makes it seem like there is information there with greater density than the lines. It's OK (even according to Tufte) to explain the reason for it in the legend, that's not cheating. For instance, when you're extrapolating beyond observed data, there is often a cutoff line placed where the data end and the extrapolation begins. I don't have other specific recommendations or examples to provide.
posted by proj at 4:37 PM on June 24, 2011


I don't think there's any need to emphasize that the data beyond K are missing. If I saw a plot where the line just stopped, I would interpret it without even noticing as "the data beyond that point are gone". No need for the red ball or anything.
posted by kiltedtaco at 4:45 PM on June 24, 2011


This seems like the sort of thing you have comparing empirical run times of different algorithms, where some are just totally intractable for certain categories. I believe what I've seen before are simply missing data points for those categories. I would draw the points and lines. The clearly missing points for one series will provide the sort of emphasis you want without being gimmicky.

If there is a real ceiling to this variable, of course you could just draw the ceiling value for L through N.
posted by grouse at 4:59 PM on June 24, 2011


Unless I am just missing something, comparing two lines that use different y axes just doesn't make much sense to me. If one is significantly lower than another, could you use a broken y axis so you don't have a lot of blank space?
posted by mikesrex at 9:22 PM on June 25, 2011


Unless I am just missing something, comparing two lines that use different y axes just doesn't make much sense to me

I'm not worried about blank space. This is just a mockup. The final figure will get tweaked in Illustrator.

The groups being compared have unique coloring. In other words, red is being compared to green. There are two measures (one for each y-axis) which both show that the green group "outperforms" the red group.

Basically, I'm trying to avoid gimmickry and "chart-junk" while still communicating:

1. The cutoff at K, and why that cutoff is important
2. Why our stuff (green) is not only better than the other guy's stuff (red), but we do it better, based on two critical performance metrics

One alternative to a line graph that I was examining was a population pyramid. The green group would go on the right half, the red group on the left half. The pyramid would be lopsided to the red group for both metrics.

The only problem is that I can't represent two variables with one population pyramid. I'd have to have two figures, whereas with a stacked line graph I can show two sets of measurements in one figure.

We're pressed for real estate in the journal article we're putting together. Also, having multidimensional graphs is a good idea (in Tufte's view, and my own) so long as the data are cleanly and fairly presented.

I think I'll just go with the line graph as is, with small annotations at the cutoff. It seems to be the easiest approach. I wish there was a way to do multivariate population pyramids, though. That would be a pretty dramatic way to show how our green stuff rocks.
posted by Blazecock Pileon at 1:27 PM on June 27, 2011


You could do back-to-back line plots with two data series in each plot. It'd be just like the population pyramid except with line plots instead of bar plots.
posted by grouse at 1:51 PM on June 27, 2011


« Older Need to find an apartment in L...   |  Can anyone identify this pin?... Newer »
This thread is closed to new comments.