How can I make my team's forecasting more scientific?
February 15, 2023 7:46 AM Subscribe
At the end of each year, my team makes a financial forecast for the next year. The forecast is updated quarterly to reflect actual achievements, market conditions, etc. Historically, we've developed a reputation for "sandbagging" by consistently providing forecast results that we significantly exceed. I'd like to try and change that.
The historic approach has been to develop an estimate for how much year-end revenue we'll produce, then multiply that by a "confidence factor" (say, 80%) to reflect uncertainty. The "confidence factor" increases as we go through the year, but never gets above 100%. In practice, this means we give ourselves a "haircut" to reflect the possibility of negative surprises, but our actual year-end results have consistently exceeded even the unadjusted forecast. This is bad for the team's credibility and over time creates an expectation that whatever number we provide is a lowball.
What I'd like to do is get away from these "confidence factor" estimates and move towards a more statistically-meaningful confidence interval, that would let us say "we don't know exactly what the final number will be but we have 90% confidence it's between X and Y."
I've got several years worth of initial projections, quarterly updates, and actual results, but I'm not sure how to use that to set up the kind of projection I've got in mind. If our base forecast is X, how do I use historical information to turn that into "X, plus or minus Y, with Z% confidence"? And how do I correct for the fact that our historical projections have a consistent negative bias?
The historic approach has been to develop an estimate for how much year-end revenue we'll produce, then multiply that by a "confidence factor" (say, 80%) to reflect uncertainty. The "confidence factor" increases as we go through the year, but never gets above 100%. In practice, this means we give ourselves a "haircut" to reflect the possibility of negative surprises, but our actual year-end results have consistently exceeded even the unadjusted forecast. This is bad for the team's credibility and over time creates an expectation that whatever number we provide is a lowball.
What I'd like to do is get away from these "confidence factor" estimates and move towards a more statistically-meaningful confidence interval, that would let us say "we don't know exactly what the final number will be but we have 90% confidence it's between X and Y."
I've got several years worth of initial projections, quarterly updates, and actual results, but I'm not sure how to use that to set up the kind of projection I've got in mind. If our base forecast is X, how do I use historical information to turn that into "X, plus or minus Y, with Z% confidence"? And how do I correct for the fact that our historical projections have a consistent negative bias?
The historic approach has been to develop an estimate for how much year-end revenue we'll produce, then multiply that by a "confidence factor" (say, 80%) to reflect uncertaintyI think in a general sense you're trying to compute something like an expectation value by multiplying an outcome $X by a probability Y, but what you're actually doing makes no sense. In a very literal sense you are arbitrarily reducing your best estimate by a factor. "Confidence factor" is not a thing. This is sandbagging. The reputation is deserved!
If $X is your best estimate, then that's what you should communicate. If you have a statistically sophisticated audience you should also communicate that there is a Y chance it will be smaller than that, and a 1-Y chance it will be bigger than that. Y can be, but doesn't have to be, 0.5. Context matters. But XY is not a useful number to communicate.
Getting X and Y right is hard, but it seems like you're actually pretty good at that bit! The trick here will be avoiding subconsciously sandbagging X. But at the very least, don't make it a routine part of adjusting X!
See this blog post for more about how to communicate the uncertainty given X and Y. (It's about estimating time to build something, where the tail risk is it takes longer. Your tail risk is that revenue is smaller, i.e. the other way, so you'll have to translate it a little bit, but the conceptual ideas are good.)
posted by caek at 8:37 AM on February 15, 2023 [2 favorites]
One way to approach this problem is to treat your initial predictions as a "feature," and then use either regression or machine learning to fit a function that transforms your historic predictions into the actual outcomes in a way that would minimize the error. This would give you something like, "take your initial prediction and increase it by 18%." If your current prediction process is pretty mechanical/deterministic this might be a useful exercise.
However, if your current estimates are more subjective, it's going to be hard to really keep making estimates in the same way once you know that, historically, on average, they under-predict by 15% (or something.) So scaling future predictions by that value that worked in the past may give you a bad forecast.
So my actual suggestion is to treat this as a "time-series" problem, and train a model on just the historic outcomes. I'm not sure how technical you are. But if you know even a small amount of Python or R, Phropet is a good starting point for making time series forecasts that includes uncertainty intervals in its predictions, and can also account for things like seasonality.
posted by dyslexictraveler at 8:58 AM on February 15, 2023
However, if your current estimates are more subjective, it's going to be hard to really keep making estimates in the same way once you know that, historically, on average, they under-predict by 15% (or something.) So scaling future predictions by that value that worked in the past may give you a bad forecast.
So my actual suggestion is to treat this as a "time-series" problem, and train a model on just the historic outcomes. I'm not sure how technical you are. But if you know even a small amount of Python or R, Phropet is a good starting point for making time series forecasts that includes uncertainty intervals in its predictions, and can also account for things like seasonality.
posted by dyslexictraveler at 8:58 AM on February 15, 2023
The historic approach has been to develop an estimate for how much year-end revenue we'll produce, then multiply that by a "confidence factor" (say, 80%) to reflect uncertainty
Well, I can understand what you're trying to do but you're going about it in a very weird way. Weighting your outcome should be part of the original revenue prediction. And there is a good chance it already is (see below) and then you are taking another 20% off the top. I can see why people don't trust your forecasts.
Organisations have different ways of predicting revenues. Sometimes it is based on an identified pipeline with each opportunity weighted to arrive at the prediction. Or your prediction is primarily based on existing business and probability of contract renewal and some kind of new contract assumptions. If your revenue prediction is primarily driven by new business development and is not driven by specific opportunities to contract, your prediction probably includes weighting of various demand or penetration factors or experience values for return on specific marketing or sales activity.
Most people would already err on the side of caution in making these predictions. So why are you taking such a large extra reduction? And why do you insist on maintaining that large factor in your re-forecasts? As you go through the year your uncertainty for the remaining quarters goes down.
So I'd start by challenging how you come up with your predictions in the first place and how much 'prudence' you are already factoring in.
posted by koahiatamadl at 10:05 AM on February 15, 2023 [1 favorite]
Well, I can understand what you're trying to do but you're going about it in a very weird way. Weighting your outcome should be part of the original revenue prediction. And there is a good chance it already is (see below) and then you are taking another 20% off the top. I can see why people don't trust your forecasts.
Organisations have different ways of predicting revenues. Sometimes it is based on an identified pipeline with each opportunity weighted to arrive at the prediction. Or your prediction is primarily based on existing business and probability of contract renewal and some kind of new contract assumptions. If your revenue prediction is primarily driven by new business development and is not driven by specific opportunities to contract, your prediction probably includes weighting of various demand or penetration factors or experience values for return on specific marketing or sales activity.
Most people would already err on the side of caution in making these predictions. So why are you taking such a large extra reduction? And why do you insist on maintaining that large factor in your re-forecasts? As you go through the year your uncertainty for the remaining quarters goes down.
So I'd start by challenging how you come up with your predictions in the first place and how much 'prudence' you are already factoring in.
posted by koahiatamadl at 10:05 AM on February 15, 2023 [1 favorite]
A simple thing you could do is:
1) go through your normal forecasting process and get your forecast for the year, call that X.
2) Now take the past 3 years’ actual results and forecasts. Divide Actual/Forecast per year. (Your answer will probably be like 1.2 or something). Average your 3 answers. Call that Y.
3) multiply X from step 1 by Y from step 2, and submit that as your forecast.
This will likely work out similarly to the commenters recommending you just not do the confidence factor, but perhaps it’ll feel like it has more of a basis. “If we know we always underestimate, we can factor that in and attempt to correct for it.”
posted by estlin at 8:44 PM on February 15, 2023
1) go through your normal forecasting process and get your forecast for the year, call that X.
2) Now take the past 3 years’ actual results and forecasts. Divide Actual/Forecast per year. (Your answer will probably be like 1.2 or something). Average your 3 answers. Call that Y.
3) multiply X from step 1 by Y from step 2, and submit that as your forecast.
This will likely work out similarly to the commenters recommending you just not do the confidence factor, but perhaps it’ll feel like it has more of a basis. “If we know we always underestimate, we can factor that in and attempt to correct for it.”
posted by estlin at 8:44 PM on February 15, 2023
Essentially the problem is you are accounting for bad unknowns but not for good unknowns. As long as it's reasonable to think such things will be roughly equal value, then your adjustments are pointless.
Where I see this behaviour coming from is if it is normal to give business units shit for failing to meet forecasts. That creates a powerful incentive in an organisation to understate expected results. If so there needs to be an honest conversation about whether it is acceptable to give a result that is as likely to miss on the downside as upside.
It strikes me that you need a lot of data from a big business to make reliable statistical models possible. A better approach might be simply asking: if this year was like last year, what would it be like? Then do a different approach based on business you know you will get, business that is at risk, and an allowance for business that you have no idea of at this stage in the year but that usually would come from somewhere. Then look at both those results and ask yourselves what a plausible number looks like. (Ignore this suggestion if you DO have a large volume and tons of data ).
posted by i_am_joe's_spleen at 11:42 PM on February 15, 2023 [1 favorite]
Where I see this behaviour coming from is if it is normal to give business units shit for failing to meet forecasts. That creates a powerful incentive in an organisation to understate expected results. If so there needs to be an honest conversation about whether it is acceptable to give a result that is as likely to miss on the downside as upside.
It strikes me that you need a lot of data from a big business to make reliable statistical models possible. A better approach might be simply asking: if this year was like last year, what would it be like? Then do a different approach based on business you know you will get, business that is at risk, and an allowance for business that you have no idea of at this stage in the year but that usually would come from somewhere. Then look at both those results and ask yourselves what a plausible number looks like. (Ignore this suggestion if you DO have a large volume and tons of data ).
posted by i_am_joe's_spleen at 11:42 PM on February 15, 2023 [1 favorite]
What's your goal though? Why do you want to change it and whose lives is it going to improve? Your team members who are probably paid less than you? To improve revenue (make already rich people richer)?
Is it actually acceptable for teams to miss their targets at all, or within the confidence interval, or will it become a way of pushing people to do more work, get more stressed, have a worse home life, feel their lives become smaller, because they missed a target they were always going to miss a certain amount of the time? If the underestimation is a protective strategy then will the new model also help protect your teams time and lives from being encroached on? Will it make people more stressed or improve their wellbeing? If my manager (or more realistically my manager's manager's manager was talking like this my heart would be sinking honestly).
Its not a science problem, its a social problem imo. Are your bosses and business owners interested in science and accurate models? No they want to make money and extract as much work as possible from the workforce. Be explicit in your mind about what it's actually for and what your values are and work backwards from that.
posted by mosswinter at 1:46 AM on February 16, 2023 [1 favorite]
Is it actually acceptable for teams to miss their targets at all, or within the confidence interval, or will it become a way of pushing people to do more work, get more stressed, have a worse home life, feel their lives become smaller, because they missed a target they were always going to miss a certain amount of the time? If the underestimation is a protective strategy then will the new model also help protect your teams time and lives from being encroached on? Will it make people more stressed or improve their wellbeing? If my manager (or more realistically my manager's manager's manager was talking like this my heart would be sinking honestly).
Its not a science problem, its a social problem imo. Are your bosses and business owners interested in science and accurate models? No they want to make money and extract as much work as possible from the workforce. Be explicit in your mind about what it's actually for and what your values are and work backwards from that.
posted by mosswinter at 1:46 AM on February 16, 2023 [1 favorite]
A few thoughts:
1. the initial predictions may not be very informative, particularly if they include "sandbagging". Can you throw them away and find other objective data sets to use as features that can plausibly be leading indicators of the thing you're trying to forecast? e.g. if you're in some specific industry, is there an industry group that collects the trailing sales data from companies in that industry and produces monthly or quarterly reports summarising the market conditions and total sales volume or so on?
If you're the first person in your team / business to attempt to do this in a data-driven statistical way, it wouldn't be surprising that the answer might turn out to be "the business hasn't been collecting enough data that is useful to generate accurate predictions, but if we start collecting X Y Z now, in 5 years, we might have enough to build a good statistical model".
2. It may be difficult to make accurate predictions from a small amount of data if the real situation is complicated, and if you don't have a strong theory of how things you can measure in advance are linked to revenue in the future. If you have a strong theory, and can express that theory as a statistical model with a small number of parameters, and then fit those parameters using historical data, you may be able to make good forecasts even with very small amounts of data. (c.f. bayesian statistics / probablistic models). Let's pretend we're trying to solve one of Gauss' problems - predicting the orbits of planets using regression - we're going to get a much better prediction if we have a theory that planets orbit the sun in elliptical orbits, and we can encode that into our statistical model (cleverly, Gauss did this kind of thing!). If instead you believe the orbits are controlled by epicircles, maybe you can forecast that, but you might need a lot more data to estimate a lot more of these artificial parameters that don't really correspond to reality.
3. i_am_joe's_spleen makes a great point about incentives. Suppose you can't directly change the forecasting method, but could change the incentives. What incentives could be set so that the problem would "solve itself" as your colleagues try to exploit the new rules and incentives in the work "game". Is it possible to set up incentives so that more accurate forecasts are produced as a side effect? Is this the outcome that the business actually wants, or are there other objectives that are more important? Silly thought experiment: what if everyone's annual bonus was heavily tied to how accurately their forecasts were to their actual annual revenue results, and nothing else? If it's easier to exactly hit a very low target, rather than stretch for a genuinely challenging target, it's pretty clear that the winning move is to pick an easy target, and then when you are at risk of exceeding it, start working very hard to turn away customers and stop any more revenue flowing in until next financial year.
posted by are-coral-made at 2:03 AM on February 16, 2023 [1 favorite]
1. the initial predictions may not be very informative, particularly if they include "sandbagging". Can you throw them away and find other objective data sets to use as features that can plausibly be leading indicators of the thing you're trying to forecast? e.g. if you're in some specific industry, is there an industry group that collects the trailing sales data from companies in that industry and produces monthly or quarterly reports summarising the market conditions and total sales volume or so on?
If you're the first person in your team / business to attempt to do this in a data-driven statistical way, it wouldn't be surprising that the answer might turn out to be "the business hasn't been collecting enough data that is useful to generate accurate predictions, but if we start collecting X Y Z now, in 5 years, we might have enough to build a good statistical model".
2. It may be difficult to make accurate predictions from a small amount of data if the real situation is complicated, and if you don't have a strong theory of how things you can measure in advance are linked to revenue in the future. If you have a strong theory, and can express that theory as a statistical model with a small number of parameters, and then fit those parameters using historical data, you may be able to make good forecasts even with very small amounts of data. (c.f. bayesian statistics / probablistic models). Let's pretend we're trying to solve one of Gauss' problems - predicting the orbits of planets using regression - we're going to get a much better prediction if we have a theory that planets orbit the sun in elliptical orbits, and we can encode that into our statistical model (cleverly, Gauss did this kind of thing!). If instead you believe the orbits are controlled by epicircles, maybe you can forecast that, but you might need a lot more data to estimate a lot more of these artificial parameters that don't really correspond to reality.
3. i_am_joe's_spleen makes a great point about incentives. Suppose you can't directly change the forecasting method, but could change the incentives. What incentives could be set so that the problem would "solve itself" as your colleagues try to exploit the new rules and incentives in the work "game". Is it possible to set up incentives so that more accurate forecasts are produced as a side effect? Is this the outcome that the business actually wants, or are there other objectives that are more important? Silly thought experiment: what if everyone's annual bonus was heavily tied to how accurately their forecasts were to their actual annual revenue results, and nothing else? If it's easier to exactly hit a very low target, rather than stretch for a genuinely challenging target, it's pretty clear that the winning move is to pick an easy target, and then when you are at risk of exceeding it, start working very hard to turn away customers and stop any more revenue flowing in until next financial year.
posted by are-coral-made at 2:03 AM on February 16, 2023 [1 favorite]
This thread is closed to new comments.
If you have only "several" years of data, there's no statistically valid way to come up with a confidence interval. It might be more useful to do some scenarios: "We're expecting revenue growth of 7.8 percent, based on current trends, but that's obviously uncertain. For example, if the Omaha contract falls through, we'd be down to 5.6 percent. And if the sales of green widgets goes up by 10 percent instead of 7 percent, total revenue would grow by 8.2 percent."
It's a natural impulse to underpromise and overdeliver. But if you change and give your best estimate, make it clear to your audience that you expect the actuals to fall short of the projections half the time. That should be seen as a success, not a failure.
posted by Mr.Know-it-some at 8:35 AM on February 15, 2023