Data Visualization to compare too many items
June 21, 2019 7:55 AM   Subscribe

I need to build a chart that can clearly show 10 different values change over time , displaying them all simultaneously and I'm stumped for a way to do this that isn't immediately a hot mess. All values will be in roughly the same range. What's Tufte say we should be doing here?
posted by radiosilents to Science & Nature (22 answers total) 7 users marked this as a favorite
 
I can’t really ventriloquize (?!) Tufte, but we need to know more first. Who needs this chart? What do they need to learn from it? What’s the chart intended to demonstrate?

Surely they are not expected to absorb the interplay between ten different variables at once (unless it’s to say HOLY SHIT THATS A LOT OF VARIABLES in which case great job), but maybe between groups of those variables, or between subsets of those variables? You could use colors or trend lines or just charting the means of subgroups, maybe, is where I’m going. But it really depends on the answers to the first question.
posted by chesty_a_arthur at 8:02 AM on June 21, 2019


Response by poster: The chart is intended to demonstrate consumption trends and help anticipate restocking needs. Let's call it a restaurant with 10 sodas on tap and they want a single dashboard showing syrup levels for all of them at once.

We've explored a line chart with different colors, but this isn't quite as readable as they would like. They have the ability to only show a handful at a time, but they have a desire to see them all at once. We're aware this isn't necessarily a good fit and if that's the long and the short of it then so be it. I'm just trying to see if we can swing something interesting & informative that's beyond my fairly meager skills.
posted by radiosilents at 8:09 AM on June 21, 2019


This is why dotted and dashed lines are used. Is it interactive, paper, electronic, poster board? Can you use color? How far away will the viewers be? The x-axis time scale is the same for all 10 variables? Can you show intermediate charts that show only a few variables building to the final chart? You do know the trick of showing a legend next to next line, right?

Judicious choice of y-axis scale is key in differentiating.
posted by at at 8:12 AM on June 21, 2019 [1 favorite]


Response by poster: This is being done in software so yeah, line weights, types, colors, the whole nine. Viewers will be in front of a PC when they're looking at it. The time scale is the same for all items. We have the ability to toggle individual values on and off at will, but they're definitely interested in seeing them all at once. The legend is present and clear, but the data itself is, obviously, cluttered.
posted by radiosilents at 8:15 AM on June 21, 2019


Just off the top of my head: haven't mocked this up so I don't know how well it would actually work, but what about a bar chart with a colour for each syrup. Use desaturated colours to fill in the bars, fully saturated colours for the line marking the end of each bar, and show some convenient period's worth of historical data as lines the same width as the bars whose saturation decreases with age. A quick glance will give you all the current levels, and you get a kind of motion-blur impression of movement from the colour gradient formed by the historical data that gives you an idea of how fast the values are shifting.
posted by flabdablet at 8:19 AM on June 21, 2019


Not sure what tools you have, but most of the mainstream tools would allow you to do some simple variant of Hans Rosling's visualization, which tracks a bunch of different data points over time. And if nothing else, the video might give you some ideas
posted by Gorgik at 8:57 AM on June 21, 2019


if I'm understanding the data, I can see something like this:
- Horizontal bar chart, x axis representing time.
- Width of each bar varies to show the value, sort of like the manpower of Napoleon's army in Russia in the famous viz.

So the bars extend the width of the chart, and don't overlap, and get skinnier or fatter to represent the varying amount of syrup in stock per flavor at time.
posted by bendybendy at 9:16 AM on June 21, 2019


The chart is intended to demonstrate consumption trends and help anticipate restocking needs. Let's call it a restaurant with 10 sodas on tap and they want a single dashboard showing syrup levels for all of them at once.

So do a line or area chart where each line on the Y-axis represents each soda and have the X-axis represent your time frame (month, year, hour, etc.). Codify each line as a different color that is easy to distinguish between and use a legend to label each line category as well.

The key is making sure the colors for each line are as distinct as possible and don't overlap.
posted by Young Kullervo at 9:21 AM on June 21, 2019 [1 favorite]


Response by poster: We're currently using a line chart very similar to the one in Young Kullervo's link above. The problem with this specific display is that the values in some cases do overlap heavily. All values start at roughly the same place and go down to a similar level before being restocked, so there's an unclear pile of spaghetti that bunches together from time to time.
posted by radiosilents at 9:27 AM on June 21, 2019 [1 favorite]


Would definitely recommend an area chart over line chart for these purposes. It makes it much easier to visualize amounts with that many variables, compared to a line chart. I am not a data viz whiz but Stephanie Evergreen is; her blog is a great resource that might give you more options!
posted by DTMFA at 9:28 AM on June 21, 2019


Yeah, this is too many variables at once for a useful chart. In the absence of seasonal variance, each of the ten items is more or less going to track the mean consumption level, adjusted by individual share. If you always sell five times as much Diet Coke as Sprite, you don't need a chart to track that over time. The only thing a chart is going to show you is your seasonal variance, so the question is: do you even have any meaningful seasonal variance? Are you going to learn that lemonade peaks in July and pumpkin spice peaks in November? Is that something you don't already know? Do you need to track year over year lemonade sales? Is that a meaningful, actionable bit of information? Does the ability to tease out other factors (e.g. an early heat wave or later first snowfall) affect your resource planning long term?

Also, what's the overall volume? Will your low volume items sell at a discernible threshold above the noise floor, or do they really just sell at a rate that you can stock only one item and reorder it after it has sold? What's your overhead? What's your opportunity cost for being out of stock on a low volume item for however long its replacement takes to arrive? Can you drop ship or fulfill directly to the end user from your warehouse? Does your inventory spoil?

I take a jaundiced view of this sort of "we have to have everything in a chart, all at once" requirement, because literally every time I've been required to build such a thing it has been shown off once, applauded as an impressive interface to the data, and then literally never used again, by anyone, for any practical purpose. (Sales demos to potential customers count as showing off, not practical use. New customers never use those charts either.) But anyway, if you can't get out of the requirement, I'd say the most useful version of this is to show mean consumption and how each item's share of overall sales varies from the mean, probably on a log scale. That'll expose the weird peaks and troughs that happen when lemonade trends on Instagram, or a hot October eats into sales of pumpkin spice, or whatever. If your trends aren't seasonal, though, this chart will be useless. If you already know what your seasonal trends are, it's only useful to the degree that it exposes changes in the trends, year over year.
posted by fedward at 9:29 AM on June 21, 2019


Is it possible the client who's saying the coloured lines are not distinct enough is colourblind? If so you can try to work with various types of dashes. There are quite a lot of dashed lines you can make, but you'll only really see them in old diagrams in books not printed in colour. Use diamonds, circles, slashes, etc.

Or, rather than having them all on one big graph, you could try 10 concurrently-running graphs — so the timescale is the same but each has a separate scale for amounts. Then maybe give them the option to toggle between one big overlapped-lines graph and 10 parallel graphs?
posted by 100kb at 9:30 AM on June 21, 2019 [1 favorite]


How about a stacked bar chart or a multiseries bar chart then? Just change the strategy bins on the y-axis to your time frame.

I also wondered if the client is color blind. I've run into that situation before. 10 items shouldn't be TOO overwhelming on a line chart if the colors are distinct, but the maximum # of items I'd use for one line chart would be 5.
posted by Young Kullervo at 9:33 AM on June 21, 2019


Also I forget what it's called but there's a technique where you have a stack or grid of individual line charts, each item in its own chart with its own Y-axis scale, but with linked time values on the X-axis. As you mouse over an individual chart (say, lemonade) the same time would be highlighted in all the other charts, with the Y-axis label tracking each item's value at that time. You could maybe see that iced tea cuts into Diet Coke specifically in the summer, while lemonade reduces everything by an equal share. But again, you actually need to be looking for seasonal trends (and find them actionable) for this sort of chart combo to be useful. I'd look at making the Y-axis value a toggle between quantity and share, and maybe have an option to adjust the magnification of Y-axis scale so minor differences are more obvious, but that will depend on your overall volume.
posted by fedward at 9:57 AM on June 21, 2019


I would do this with separate panels for each syrup 1 column by 10 rows, x axis is time, y-axis is same for each panel (if possible, at least scale should be the same), ordered from top to bottom in some useful way.
posted by momus_window at 10:06 AM on June 21, 2019


Based on what you said about restaurant sodas, I think I’d just display the graphs and focus on calling out significant events or relationships. Off the top of my head: a green graph means “fully stocked”. A red graph means “almost out”. The graph with least movement is highlighted in purple. The graph with the most movement is highlighted in green. Coke is the all-time best seller. Diet Fanta Orange is the least favorite. And so on. What you can call out will depend on the specific data you’re displaying.
posted by doctor tough love at 10:15 AM on June 21, 2019


Best answer: To key off what doctor tough love is saying, one potentially useful way of displaying the data is days of inventory in stock. If you're trying to figure out how to manage Just In Time inventory (so you don't order something before you need to, but so you never run out), then you can graph the number of days of sales (at current or predicted trends) that could be supplied with current inventory. Then you compare that value with how long the supply chain takes. If it takes seven days from PO to item on hand, you want to make sure you're ordering before you get to only seven days of inventory remaining. If you know your supply chain is unreliable, you might need to move your restocking threshold up a bit and just have that bit of additional overhead. If you can afford to be out of stock, move the threshold closer to the actual rate of consumption. But now I'm really just guessing at how you might be able to use this sort of data at all.
posted by fedward at 10:33 AM on June 21, 2019 [1 favorite]


Response by poster: Fedward, that's an interesting approach and you're very, very close to understanding the need. We have the added wrinkle that consumption increases as inventory depletes, but once we nail that algo I think this is potentially a great way to actually step away from charting the actual day-to-day values at all and just delivering the most valuable data point for each hypothetical syrup - "When will I be out?" because the goal is to never, ever be out.
posted by radiosilents at 10:37 AM on June 21, 2019


Best answer: I came here to say pretty much what doctor tough love said. You've got to think about what is interesting here. Is it very low levels? Very high? Departures from the running average? I can imagine a chart with a sort of "double logarithmic" Y axis, where values close to the norm are clustered together, but deviations are visually amplified so they're easier to spot. You might want to graph something more indirect, like "time that this value has been 1+ standard deviation from the mean."
posted by adamrice at 10:43 AM on June 21, 2019 [1 favorite]


OK, if you can't afford to be out, then you're really just comparing days of supply on hand with lead time. Divide supply by rate, subtract lead time, and reorder based on the result. If you're ordering everything premade, you should be tracking quoted lead time and on time performance for each vendor. More reliable vendors can have lower ordering thresholds than the ones who sometimes leave you high and dry. If you can't afford to run out, but you also can't trust your vendors to deliver on time as promised, then you'll have higher overhead costs because you have to maintain higher supply levels yourself.

If these are for syrups you're manufacturing on site (to extend the syrup example, say it's a bar and you make your own tonic water from quinine and selected aromatics) then you'll have a second order inventory problem that you need to manage the components that go into your syrups so you're not out of something you need when it's time to make another batch, but you can still track consumption and map it to the raw materials.

Also if your product is a fresh product that expires, you will also want to manage your ordering so you don't have too much loss to spoilage, although low volume/high spoilage items will give you the classic headache of determining whether it's worth carrying them at all. That's left as an exercise for people working in supermarkets and restaurants. Durable goods don't expire in quite the same way.
posted by fedward at 11:14 AM on June 21, 2019


For the Tufte-ian graph I think it would be small multiples of sparklines. The same sort of thing that could be used to show the price over time of dozens of stocks at a glance.

Cola A: █▆▅▄▂▁▁▁
Cola B: █▇▆▅▅▃▂▁▁
Cola C: █▇▆▆▅▅▅▄▁
...

It looks much better with lines. In sort of a similar way the graphing with small multiples to me is like looking at the dashboard of graphs for dozens of network interfaces all at once.
You could even draw a horizontal line across each graph at the level where "when it drops below this line it's time to order the next box of syrup".

Like on this page in the Ganglia and MRTG examples. Except the graphs you draw would always jump up when you add supply and slowly go down as supply is used up.
posted by zengargoyle at 7:03 PM on June 21, 2019


Classic inventory theory involves such quantities as the Economic Order Quantity, and the Reorder Point. Though the EOQ seems wonky, it often comes down to a case, or a pallet, or a truckload, or the quantity at which you get a discount. You place a new order when inventory falls to the Reorder Point.

To get back to your specific question, I'd try a horizontal bar chart with each bar showing the days sales in inventory. Possibly, you can add the total amount in inventory as a label.
posted by SemiSalt at 1:28 PM on June 22, 2019


« Older Safe Grill Brush   |   What to expect at a medical malpractice trial.... Newer »
This thread is closed to new comments.