Exponential Smoothing - Indices of Lack of Fit (Error)

The most straightforward way of evaluating the accuracy of the forecasts based on a particular α value is to simply plot the observed values and the one-step-ahead forecasts. In the Time Series module, this plot also includes the residuals (scaled against the right y-axis), so that regions of better or worst fit can also easily be identified. This visual check of the accuracy of forecasts is often the most powerful method for determining whether or not the current exponential smoothing model fits the data. In addition, besides the ex post MSE criterion [see Choosing the Best Value for Parameter a (Alpha)], there are other statistical measures of error that can be used to determine the optimum α parameter (see Makridakis, Wheelwright, and McGee, 1983; all measures will automatically be computed by the Time Series module):

Element Name Description
Mean error The mean error (ME) value is simply computed as the average error value (average of observed minus one-step-ahead forecast). Obviously, a drawback of this measure is that positive and negative error values can cancel each other out, so this measure is not a very good indicator of overall fit.
Mean absolute error The mean absolute error (MAE) value is computed as the average absolute error value. If this value is 0 (zero), the fit (forecast) is perfect. As compared to the mean squared error value, this measure of fit will "de-emphasize" outliers, that is, unique or rare large error values will affect the MAE less than the MSE value.
Sum of squared error (SSE), Mean squared error These values are computed as the sum (or average) of the squared error values. This is the most commonly used lack-of-fit indicator in statistical fitting procedures.
Percentage error (PE) All the above measures rely on the actual error value. It may seem reasonable to rather express the lack of fit in terms of the relative deviation of the one-step-ahead forecasts from the observed values, that is, relative to the magnitude of the observed values. For example, when trying to predict monthly sales that may fluctuate widely (e.g., seasonally) from month to month, we may be satisfied if our prediction "hits the target" with about ±10% accuracy. In other words, the absolute errors may be not so much of interest as are the relative errors in the forecasts. To assess the relative error, various indices have been proposed (see Makridakis, Wheelwright, and McGee, 1983). The first one, the percentage error value, is computed as:

PEt = 100*(Xt - Ft )/Xt

where Xt is the observed value at time t, and Ft is the forecasts (smoothed values).

Mean percentage error (MPE) This value is computed as the average of the PE values.
Mean absolute percentage error (MAPE) As is the case with the mean error value (ME, see above), a mean percentage error near 0 (zero) can be produced by large positive and negative percentage errors that cancel each other out. Thus, a better measure of relative overall fit is the mean absolute percentage error. Also, this measure is usually more meaningful than the mean squared error. For example, knowing that the average forecast is "off" by ±5% is a useful result in and of itself, whereas a mean squared error of 30.8 is not immediately interpretable.
Automatic search for best parameter The Time Series module effectively takes the guessing out of the parameter search process. A quasi-Newton function minimization procedure (the same as in ARIMA) is used to minimize either the mean squared error, mean absolute error, or mean absolute percentage error. In most cases, this procedure is more efficient than the grid search (particularly when more than one parameter must be determined), and the optimum α parameter can quickly be identified.

The first smoothed value S0. A final issue that we have neglected up to this point is the problem of the initial value, or how to start the smoothing process. If you look back at the formula above, it is evident that one needs an S0 value in order to compute the smoothed value (forecast) for the first observation in the series. Depending on the choice of the α parameter (i.e., when α is close to zero), the initial value for the smoothing process can affect the quality of the forecasts for many observations. As with most other aspects of exponential smoothing it is recommended to choose the initial value that produces the best forecasts. On the other hand, in practice, when there are many leading observations prior to a crucial actual forecast, the initial value will not affect that forecast by much, since its effect will have long "faded" from the smoothed series (due to the exponentially decreasing weights, the older an observation the less it will influence the forecast). The Time Series module allows for user-defined initial values, but will also automatically compute initial values.