Being critical of one’s own work, is even more important for the financial doing the forecast...
As we know a simple matter of spotting bias – systematic under or over forecasting – can get surprisingly tricky in practice if our actions are to be guided by scientific standards of evidence – which they need to be if we are actually going to improve matters.
Reliably identifying systematic forecast error requires that we take account of both the pattern and magnitude of bias using approaches that explicitly take account of probabilities.
How to find the needle in the haystack
Let’s assume that you have a method for reliably detecting bias in a single forecast. How can this be deployed at scale in a large company where forecast are mass produced? In these types of businesses, a single demand manager will typically be responsible for upwards of a thousand forecasts, every one of which might be reforecast on a weekly basis, any one of which might unexpectedly fail at any time if the pattern of demand suddenly changes.
This kind of forecaster is a master craftsman carefully selecting the right forecasting method and polishing the result until it is ‘perfect’. Instead, they are managers of a forecast factory churning out thousands of ‘items’ at a fast rate, none of which will be as perfect as those produced by a master craftsman, but all of which need to be fit for purpose; ‘good enough’.
Clearly it is important that the demand manager continuously reviews the performance of every forecast every period so that defective ‘products’ enter the supply chain. But when they have such a large portfolio is it realistic?
Probably not.
The ‘obvious’ solution to the complexity facing practitioners that most companies adopt is to calculate bias at a high level in the hierarchy and investigate further only when there is evidence of a problem.
The flaw of this approach is that it is extremely unlikely that every forecast in a portfolio or a category is biased in the same way. And when they are not, the errors for those items that are over forecast will be offset the under forecast errors to a greater or lesser degree, with the result that chronic bias at the low level is hidden. And it is the bias at this low level that is important because the replenishment process is driven by these granular forecasts, not high-level aggregates.
The bottom line is that even if high-level bias measures are calculated in a statistically intelligent way (as described in part 1 of this series) they are a completely unreliable guide to the level of bias at the level where it counts – the lowest level.
And the degree of the problem can be considerable; in practice, it is very common to find average errors calculated at a high level underrating low-level bias by many orders of magnitude. For example, it is quite common to find a product category with an average level of bias of say 2%, which most people would consider to be acceptable, being the result of some SKU’s being over forecast by an average of 20% and the rest being under forecast by 18%.
This is one important reason why companies may experience customer service failure despite having high total inventory levels and apparently good forecast performance metrics.
The solution
So how do we reconcile the need to track forecast performance on a very frequent, highly granular level with the apparent impossibility of doing so?
The answer is to measure low level under and over forecasting separately and to test these measures for evidence of statistically significant high levels of bias, in the manner described in the last post. Then use these alerts along with measures of the scale of the problem will direct the attention of forecasters to the relatively small number of failing forecasts that matter.
Bias is the most treatable symptom of a failing forecast process. Even if we cannot track the subtle changes in the pattern of the demand signal it should be possible to get the estimate the level reasonably easily. If we do start consistently under or over forecasting it should be straightforward to detect, and correction usually requires no more than a simple recalibration of our models or our judgement.
But like many things that are simple in theory dealing with bias can become a more intractable problem given the scale and pace at which forecasting is conducted in practice.
So when it comes to driving out bias, size matters.