Need advice about statistics...
November 18, 2008 4:21 PM   Subscribe

Am I distorting my data, and not showing the true picture?

I have 6 months of call centre data.
I'm trying to understand when we're busy, when we're slack, by comparing the number of calls in each hour.

Should I count calls, and group by hour, and sum of the counts?
Or should I take the average hourly count, per day and sum that?
posted by matholio to Science & Nature (9 answers total) 1 user marked this as a favorite
 
What do you mean when you say "average hourly count, per day"?
posted by found missing at 4:40 PM on November 18, 2008


Don't use averages when you have the raw numbers. I'd sum them up by hour and graph the results - see if there is a pattern.
posted by desjardins at 4:59 PM on November 18, 2008


What you should do depends on what question you'd like answered. "When are we busy?" is a good starting point and we can get more specific from there.

I would guess that what you want to find is any pattern of busy-ness across two dimensions: hour of the day and day of the week (longer-term seasonal differences might also be relevant). I would further speculate that you don't want statistics, per se, but rather a general picture of what's going on. So a graph would be in order.

This is a perfect job for Excel or one of the free spreadsheet programs (OpenOffice, gnumeric, and so on).

First, for each individual day I would count how many calls are made in each 15 minute period. This will give you 96 data points per day, which should be enough to give a good resolution to the graph but not so many that the graph gets "spiky" from a high variation from period to period.

If you're experienced with spreadsheet programs and have set everything up well, it should be relatively straightforward to average these numbers for each day of the week: Take all the Mondays in your six-month range and average the data points together to get 96 new data points, where each data point shows the average Monday call volume for a particular 15-minute window. Make sense? Do that for each day of the week, and then graph it. You'll get a graph with 7 lines (different colors ideally), each with (I would guess) a gentle peak around 3pm or so and a low around 3am. This graph would let you answer questions like: What is the busiest day? What's the busiest hour? Are weekends busy at the same time of day as weekdays? And so on.
posted by kprincehouse at 5:12 PM on November 18, 2008


@found_missing
Sorry, poorly articulated.
I meant count the calls in each hour, each day, and average the results.

@desjardins
summing by hour, produces a larger range between the peaks and troughs.
That what I menat by distortion.
Actually this is how I originally did my chart, but I was worried I was producing misleading comparisons.
posted by matholio at 5:40 PM on November 18, 2008


matholio, the key is to minimize the binning you are doing while at the same time making it easy to understand.

If you are interested in daily data, I would aggregate all the calls by day of the week but keep the times distinct and see if you can make a useful heat map out of that. Excel is really useless when it comes to serious visualization but you might get away with something like this.

You should also do something similar for just a single day by aggregating all 6 months of activity into a single "day".

Finally, you're probably interested in seeing how the calendar itself affects call volume, and that would just a simple bar chart with number of calls that day over the range of 6 months.
posted by onalark at 6:51 PM on November 18, 2008


@kprincehouse
Yes, this is p[retty much what I thought.
Per hours is fine, I need to show how effective our shifts are
I also need to slice it by country and langauge.
posted by matholio at 8:09 PM on November 18, 2008


Count minutes too. You may only have 20 calls in one hour but if the calls all last 30 minutes then its a different scenario from having 60 calls that only last 4 minutes. That should give you a better view of how many agents are busy at a given time.
posted by damn dirty ape at 7:01 AM on November 19, 2008



@damn_dirty_ape
Thanks, I hadn't thought of that. Talktime per hour is actually essential to understand busy-ness.
posted by matholio at 1:56 PM on November 20, 2008


In six months, at a hundred calls an hour, you might have half a million calls? Maybe a few millions? You could probably fit the whole dataset on a poster-sized sheet of paper as a table of sparklines. Make stairstep line of "number of active calls," which moves up by a unit when a call comes in and down by a unit when a call ends. Maybe fill the area under the curve so that busier times look darker from across the room.

Having all the data in your field of view at once will give you a better sense about whether your summaries are accurate, whether you show the fuller view to your bosses or not.
posted by fantabulous timewaster at 10:55 AM on November 21, 2008


« Older Things to do in Denver   |   Need a low-cost alternative airline for flying to... Newer »
This thread is closed to new comments.