Statistics 101
January 7, 2015 10:14 AM   Subscribe

I have 754 tickets in our development ticket system. All tickets are considered "minor" so they are not very disparate (you would not get one ticket with 4 hours, and another ticket of 4 days). Given that I have ~200 of these items estimate by hand, how can I generate estimates for the rest of the items?

I have Excel, but I'm open to using anything. Also I'm intentionally not describing what I've tried to not try and bungle by intent with my hazy memory of statistics class.
posted by geoff. to Science & Nature (13 answers total) 1 user marked this as a favorite
 
I'm sorry, can you explain this a little clearer? What are you trying to estimate, time they will take to close? Is there any distinguishing characteristics of these things, or are they all functionally the same? It sounds like the easiest and most accurate thing to do is just straight up average the estimates of the 200 you've already done and give that estimate to the remaining 550 or so, unless there's something that separates the ones you've done from the ones you haven't.
posted by brainmouse at 10:23 AM on January 7, 2015 [2 favorites]


Most of the modern ticketing systems I've used (HEAT, Remedy, Jira) have enough reporting built in to break down mean/median time to completion by severity and category, not sure what system you're using though.
posted by Oktober at 10:24 AM on January 7, 2015


Response by poster: Sorry, using JIRA and wasn't clear. I have tickets that either have no estimate or an estimate. I'm trying to populate the tickets without an estimate with estimates based on the data from similarly labeled tickets. So in my mind, if I have 10 tickets estimated, and 1 out of 10 tickets is 8 hours. Then if I have 100 similar tickets, I'd have 9 more 8 hour tickets. This way I only have to estimate a subset of the data. Or is this bad math?
posted by geoff. at 10:35 AM on January 7, 2015


How are you planning to identify "similar" tickets?
posted by chesty_a_arthur at 10:39 AM on January 7, 2015


Is the distribution of it important?

Because my first thought wasn't whether 1 out of 10 is 8 hours, but just for example the mean average is 5.3 hours.

But knowing the mean average might be useless if what's really important are the odds that a ticket will take over 6 hours. Is that what you mean in your follow-up?
posted by RobotHero at 10:40 AM on January 7, 2015


Response by poster: RobotHero, yes that is what I mean!

chest_a_arthur, this is an assumption I'm making, as someone already identified them as similar based on criteria.
posted by geoff. at 10:47 AM on January 7, 2015


Oh, I see. So what you're asking is how many you have to estimate before you have enough information to confidently "meta-estimate" the others? How confident do you want to be?
posted by chesty_a_arthur at 10:53 AM on January 7, 2015


Response by poster: Without getting too technical about it, I would like to be like 80% right? If that's not too high. With knowledge that if I miss one, I'm missing at most 6 hours (min being 2, max being 8). That's a big assumption I'm making about my data but one I'm willing to live with.
posted by geoff. at 11:06 AM on January 7, 2015


I can see if you're scheduling people to address these tickets it's nice to know the odds that a ticket will exceed a certain amount of time. Of course, we can't predict which ones will exceed it, only the likelihood.

If you do have confidence that those criteria are picked correctly, then find all the matching tickets from your sample of 200, and count the percentage that match what you're trying to predict. (In this case, whether it took more than X hours.) Then those are the predicted odds.

If you don't have confidence in those criteria, there's some more advanced stuff with cross-validation where you divide your 200 into sample data and test data.

On preview: Okay, if you want the time that 80% of your sample is less than, what you're looking for is the 80th percentile.

But that's not the same as 80% between the min and max if that's what you're thinking. For example, if you had 8 tickets that took 1 hour and 2 that took 11 hours, you would say with 80% confidence it will take 1 hour, even though 80% between the min and max is 9 hours.
posted by RobotHero at 11:17 AM on January 7, 2015


I think 80% between min and max is the same as the 80th percentile only if you have a completely flat distribution, which is almost certainly not going to be the case here, and doesn't serve any useful purpose to assume it.


And since you mentioned Excel, there is a function for getting the Nth percentile of a sample. Selecting the sample based on the other criteria will probably be the more complex part of this.
posted by RobotHero at 11:27 AM on January 7, 2015


- 754 tickets.
- BY HAND, you RANDOMLY[1] graded 200 of them.

Then the best estimator for the total time of the others is

(754-200)/200 * sum(the 200) --OR-- (754-200) * mean(the 200).

How confident shoud you be? Well, it depends on how black swan-ish your black swans are! Do the really long tickets take *forever*? If so, then mean-time won't be very stable! But median will be :)

As others describe, the RANGE STATISTICS (median, 80th percentile) will help you say things like: 80% of the tickets should take less than X minutes.

Terms: range statistics, empirical distribution

[1] Note, RANDOM SELECTED is important here. If you picked all the 'fast ones'
posted by gregglind at 11:45 AM on January 7, 2015


One of the interesting things about statistics is that if your goal is generalizing from a sample to a population (e.g. from your 200 tickets to your 754 tickets), the accuracy doesn't depend on the size of the population. All it depends on is the size of your sample and how varied the things are that you're trying to estimate. This is called the Central Limit Theorem.

If you wanted to do this all fancy-like, you could compute estimators for various properties of your distribution and then you'd not only have a guess for e.g. the 90th percentile of estimated ticket resolution time, you'd know how accurate that guess is likely to be. But---assuming that you have randomly selected the tickets to estimate!--you should just be able to take the percentiles and assume that they're pretty accurate. 200 is a lot of estimates.
posted by goingonit at 1:54 PM on January 7, 2015


If you're data mining JIRA, it's a good idea to consider where the data is coming from, and how that might bias your results. Perfect data will be a truly random sample of the population; data collected from the field is often invisibly tainted. For example, if you're trying to infer how long the remaining open tickets will take, based on how long the tickets currently close are, you should assume that your staff has been cherry picking the easy ones first. This effectively divides your population into two categories (open tickets and closed ones), with substantially different characteristics.

The second thing to think about is why you believe that minor tickets can be fixed in hours not days. JIRA commonly use severity to mean how important a problem is, not how hard it is to fix. I've seen minor defects in software that take days to debug and fix. And conversely, plenty of terrible, horrible, career-ending-if-put-in-production bugs that are simple one line fixes.

The last thing to think about is why you think the size of the minor defect pool is 754. That's simply a measure of what testers have found. What you need to consider is testers ability to find new issues in conjunction with programmers' ability to resolve them. The fact that your pool is now upwards of 700 suggests your programming team is not able to keep pace, and I figure bad software is fractally bad: there's an infinite number of rough surfaces in need of polishing.

Finally, a bit about rating your estimates. When you say you want to be 80 percent right, statistics uses confidence as a metric. One is to be 80 percent confident you have the right answer. Confidence is a tricky thing, but consider providing upper and lower bounds for the following:

1. How long is the Nile River?
2. How many people live in Mexico?
3. What year was President James A. Garfield born?
4. What was the percentage difference between market open and market close for the S&P500 on October 19th, 1987?
5. What's the box office sales for the newest Hobbit movie?

If I asked you estimate such that you expected to get 4 out of 5 correct, those estimates would be plausibly be called 80 percent confidence intervals. Confidence can tell you how often you'll be right, and it's fairly easy to see how taking an interval and expanding it would improve your confidence the true value is in the interval.

What confidence doesn't do is give bound your estimation error. For example, if your 80 percent confidence estimate for the S&P was between -5 percent and +5 percent, the real value is not constrained to -6 and +6. Indeed, it was down 20 percent that day.
posted by pwnguin at 1:51 AM on January 9, 2015 [1 favorite]


« Older Time Capsule on display?   |   How to avoid wet face while wearing face-covering... Newer »
This thread is closed to new comments.