In an uncertain and fast-changing world, line managers need to be made aware of the uncertainties...
When a multidisciplinary research study group at Princeton University undertook a study of the paired uses of electricity and gas in townhouses, it contacted the residents of Twin Rivers, a nearby planned community in New Jersey. Over a five-year study period, it learned how to eliminate three-quarters of the energy used by the furnace in quite ordinary, reasonably well-built townhouses, as chronicled in Saving Energy in the Home: Princeton's Experiments at Twin River, edited by Robert H. Socolow (Cambridge, MA: Ballinger, 1977).
The purpose of the Princeton study, during a winter in the mid-1970s, was to examine differences in energy use and make comparisons with structural aspects of the 152 individual townhouses and the behavioural aspects of their inhabitants. As a data scientist, I took great delight in being a participant and was intrigued by later looking at the results and the data from the study. I was a resident at Twin Rivers at the time, not realizing that some new analysis techniques used on the data would eventually be published in 1977 in the ground-breaking book Exploratory Data Analysis by data science pioneer John W. Tukey (1915–2000).
The data were gathered automatically through a special device that was hooked up to the landline telephones and the energy sources in the home. There were questions to be answered periodically about our lifestyle, the details of which have long escaped my memory. Nevertheless, some novel uses of graphing techniques with schematic data plots (data visualization) can be found throughout in my new book. These techniques, new at the time, have now become a familiar part of many business statistics books.
Exploring Data Patterns
Studying the patterns in the data improves the forecaster’s chances of successfully modeling data for forecasting applications. Through exploratory data analysis (EDA), a demand forecaster can start the important task of finding factors (drivers of demand) that are generally quantitative in nature.
Tukey likens EDA to detective work: “A detective investigating a crime needs both tools and understanding. If he/she has no fingerprint powder, the detective will fail to find fingerprints on most surfaces. If detectives do not understand where criminals are likely to have put their fingers, they will not look in the right places.” A planned forecasting and modelling effort that does not include provisions for exploratory data analysis often miss the most interesting and important results; but it is only a first step, not the whole story.
Exploratory data analysis means looking at data, absorbing what the data are suggesting, and using various summaries and display methods to gain insight into the process generating the data.
Many business forecasting books describe a variety of classical ways to summarize data. For the practitioner, an entertaining yet informative cartoon guide covering these is Gonick and Smith’s A Cartoon Guide to Statistics, published in 1993. For example, the familiar histogram is widely used in practice. In addition, there are a number of lesser-known techniques that are specifically useful in analyzing large quantities of data that have become accessible as a result of the increased flexibility in data management, computer processing, and predictive analytics. Because of their potential value to demand forecasting, we describe them in some detail. in my new book: Change & Chance Embraced: Achieving Agility with Demand Forecasting in the Supply Chain.
Learning by Looking at Data Patterns
Because most forecasting methods require data, a forecaster analyzes the availability of data from both external (outside the company) and internal (within the company or its industry) sources. For example, one potential source of internal data is a corporate data warehouse or Enterprise Resource Planning (ERP) system, which normally contains a rich history of product sales, shipments, prices, revenues, expenses, capital expenditures, and marketing programs.
The availability of external data is improving rapidly. Most of the required demographic factors (age, race, sex, households, and so forth), forecasts of economic indicators, and related variables can be readily obtained from computerized data sources and from industry and government publications on the Internet.
With the explosion of Internet websites, potential sources of valuable data are becoming limitless. With unstructured data, the need for data mining tools has become a necessity for exploring potential sources of data for consumer analyses and predictive modelling purposes.