If measurement – or the lack of it – is the biggest weakness in most forecasting...
Planners and managers in supply chain organizations are accustomed to using the Mean Absolute Percentage Error (MAPE) as their best (and sometimes only) answer to measuring forecast accuracy. It is so ubiquitous that it is hardly questioned. I do not even find a consensus on the definition of forecast error in supply chain organizations around the world among practitioners who participate in the forecasting workshops. For most, Actual (A) minus Forecast (F) is the forecast error, for others just the opposite.
Among practitioners, it is a jungle out there trying to understand the role of the APEs in the measurement of forecast accuracy. Forecast accuracy is commonly measured and reported by just the Mean Absolute Percentage Error (MAPE), which is the same no matter which definition of forecast error one uses.
Bias is the other component of accuracy, but is not consistently defined, either. For some, Actual (A) minus Forecast (F) is the forecast error, for others just the opposite. If bias is the difference, what should the sign be of a reported underforecast or overforecast? Who is right and why?
Outliers in forecast errors and other sources of unusual data values should never be ignored in the accuracy measurement process. For a measurement of bias, for example, the calculation of the mean forecast error ME (the arithmetic mean of Actual (A) minus Forecast (F)) will drive the estimate towards the outlier. An otherwise unbiased pattern of performance can be distorted by just a single unusual value.
When an outlier-resistant measure is close to the conventional measure, you should report the conventional measure. If not, the analyst should check out the APEs for anything that appears unusual. Then work with domain experts to find a credible rationale (stockouts, weather, strikes, etc.)
Are There More Reliable Measures Than the MAPE?
The M-estimation method, introduced in Chapter 2 of my new book can be used to automatically reduce the effect of outliers by appropriately down- weighting values ‘far away’ from a typical MAPE. The method is based on an estimator that makes repeated use of the underlying data in an iterative procedure. In the case of the MAPE, a family of robust estimators, called M-estimators, is obtained by minimizing a specified function of the absolute percentage errors (APE). Alternate forms of the function produce the various M-estimators. Generally, the estimates are computed by iterated weighted least squares.
It is worth noting that the Bisquare-weighting scheme is more severe than the Huber weighting scheme. In the bisquare scheme, all data for which | ei | ≤ Ks will have a weight less than 1. Data having weights greater than 0.9 are not considered extreme. Data with weights less than 0.5 are regarded as extreme, and data with zero weight are, of course, ignored. To counteract the impact of outliers, the bisquare estimator gives zero weight to data whose forecast errors are quite far from zero.
What we need, for best practices, are robust/resistant procedures that are resistant to outlying values and robust against non-normal characteristics in the data distribution, so that they give rise to estimates that are more reliable and credible than those based on normality assumptions.
Taking a data-driven approach with APE data to measure precision, we can create more useful TAPE (Typical APE) measures. However, we recommend that you start with the Median APE ( MdAPE) for the first iteration. Then use the Huber scheme for the next iteration and finish with one or two more iterations of the Bisquare scheme. The Huber-Bisquare-Bisquare Typical APE (HBB TAPE) measure has worked quite well for me in practice and can be readily automated even in a spreadsheet. This is worth testing with your own data to convince yourself whether a Mean APE should remain King of the accuracy jungle!!
Details may be found in Chapter 4 of Change & Chance Embraced: Achieving Agility with Demand Forecasting in the Supply Chain.