July 21, 2011 9:19 AM Subscribe

Is there an accepted way to graphically represent rates when small sample data is anonymized?

I have a data set where a large number of points are "less than ten but greater than one incidents."

I would like to graph this data and I am wondering if there is an standard way to show this sort of anonymized data on a graph.

If this were a population of 100, the rate would be within 1%-10%. On my format (excel, horizontal line chart), I am perhaps picturing a wide line representing the possible min and max range, similar to the Minard's famous Napoleon illustration (except the thickness of the line wouldn't represent the number of soldiers, but rather the possible range of values.

Or perhaps another way to phrase this - one that may betray my ignorance of statistics, but - what would it look like if a line graph and a box-and-whiskers graph had a baby?
posted by BleachBypass to Computers & Internet (9 answers total)

I have a data set where a large number of points are "less than ten but greater than one incidents."

I would like to graph this data and I am wondering if there is an standard way to show this sort of anonymized data on a graph.

If this were a population of 100, the rate would be within 1%-10%. On my format (excel, horizontal line chart), I am perhaps picturing a wide line representing the possible min and max range, similar to the Minard's famous Napoleon illustration (except the thickness of the line wouldn't represent the number of soldiers, but rather the possible range of values.

Or perhaps another way to phrase this - one that may betray my ignorance of statistics, but - what would it look like if a line graph and a box-and-whiskers graph had a baby?

What do you want to learn / teach from this plot? Just the distribution of counts 1-10? Maybe a histogram with confidence intervals (or a confidence band)? You say "rates", is there a denominator which varies?

posted by a robot made out of meat at 9:41 AM on July 21, 2011

posted by a robot made out of meat at 9:41 AM on July 21, 2011

How about something like Figure 2 on this page? It's a line graph showing minimum, maximum and mean. You could add in whatever other percentiles you wanted.

But as a robot made out of meat says it would help to know what you're trying to convey, as different types of graph will draw attention to different aspects of the data.

posted by A Thousand Baited Hooks at 9:44 AM on July 21, 2011

But as a robot made out of meat says it would help to know what you're trying to convey, as different types of graph will draw attention to different aspects of the data.

posted by A Thousand Baited Hooks at 9:44 AM on July 21, 2011

@blazecock, yes, I think that's more accurate - box and whiskers is not what I'm looking for; a "whiskers" graph - the line graph with error bars looks simple and straightforward.

@robot - definitely rates; a good example would be displaying a rate of "less than 10 out of 130" and "less than 10 out of 18" on the same graph.

Ancient example from the good ol' days - pre-data-obfuscation.

@ATOB, I do like that to some degree, but want to make sure it's obvious that the lines are related; as you can see, I am using lines and min-max bars already.

posted by BleachBypass at 11:02 AM on July 21, 2011

@robot - definitely rates; a good example would be displaying a rate of "less than 10 out of 130" and "less than 10 out of 18" on the same graph.

Ancient example from the good ol' days - pre-data-obfuscation.

@ATOB, I do like that to some degree, but want to make sure it's obvious that the lines are related; as you can see, I am using lines and min-max bars already.

posted by BleachBypass at 11:02 AM on July 21, 2011

I suppose perhaps something like this is a possibility, but GAWD, what a kludge.

posted by BleachBypass at 12:21 PM on July 21, 2011

posted by BleachBypass at 12:21 PM on July 21, 2011

R's `ggplot2` library includes a ribbon plot along the lines of what you linked to.

It is never a kludge to match a good graph design to whatever communicates the data efficiently and correctly.

It all comes down to what story you want the data to tell.

posted by Blazecock Pileon at 3:31 PM on July 21, 2011

It is never a kludge to match a good graph design to whatever communicates the data efficiently and correctly.

It all comes down to what story you want the data to tell.

posted by Blazecock Pileon at 3:31 PM on July 21, 2011

That is, if your data are continuous, then a ribbon plot can make sense. If these are discrete points that it is okay to interpolate between measurements, then a ribbon plot can make sense.

The issue isn't what is communicated at the data points, but what inferences your audience will derive, based on what the plot shows*between* data points.

You have to ask if that makes sense for the story you want the graph to tell.

posted by Blazecock Pileon at 3:34 PM on July 21, 2011

The issue isn't what is communicated at the data points, but what inferences your audience will derive, based on what the plot shows

You have to ask if that makes sense for the story you want the graph to tell.

posted by Blazecock Pileon at 3:34 PM on July 21, 2011

I like it! Thanks.

posted by BleachBypass at 12:23 PM on July 22, 2011

posted by BleachBypass at 12:23 PM on July 22, 2011

I did not know about ggplot2, very interesting.

posted by BleachBypass at 12:24 PM on July 22, 2011

posted by BleachBypass at 12:24 PM on July 22, 2011

This thread is closed to new comments.

what would it look like if a line graph and a box-and-whiskers graph had a baby?A line chart with error bars? The box and whiskers plots attempt to communicate a little bit more (like distribution of measurements) but error bars would get you close.

posted by Blazecock Pileon at 9:29 AM on July 21, 2011