How to: |
The linear regression equation estimates values by assuming that the dependent variable (the new calculated values) and the independent variable (the sort field values) are related by a function that represents a straight line:
y = mx + b
where:
Is the dependent variable.
Is the independent variable.
Is the slope of the line.
Is the y-intercept.
FORECAST_LINEAR uses a technique called Ordinary Least Squares to calculate values for m and b that minimize the sum of the squared differences between the data and the resulting line.
The following formulas show how m and b are calculated.
where:
Is the number of data points.
Is the data values (dependent variables).
Is the sort field values (independent variables).
Trend values, as well as predicted values, are calculated using the regression line equation.
FORECAST_LINEAR(display, infield, interval, npredict)
where:
Keyword
Specifies which values to display for rows of output that represent existing data. Valid values are:
Note: You can show both types of output for any field by creating two independent COMPUTE commands in the same request, each with a different display option.
Is any numeric field. It can be the same field as the result field, or a different field. It cannot be a date-time field or a numeric field with date display options.
Is the increment to add to each sort field value (after the last data point) to create the next value. This must be a positive integer. To sort in descending order, use the BY HIGHEST phrase. The result of adding this number to the sort field values is converted to the same format as the sort field.
For date fields, the minimal component in the format determines how the number is interpreted. For example, if the format is YMD, MDY, or DMY, an interval value of 2 is interpreted as meaning two days. If the format is YM, the 2 is interpreted as meaning two months.
Is the number of predictions for FORECAST to calculate. It must be an integer greater than or equal to zero. Zero indicates that you do not want predictions, and is only supported with a non-recursive FORECAST.
The following request calculates a regression line using the VIDEOTRK data source of QUANTITY by TRANSDATE. The interval is one day, and three predicted values are calculated.
TABLE FILE VIDEOTRK SUM QUANTITY COMPUTE FORTOT=FORECAST_LINEAR(MODEL_DATA,QUANTITY,1,3); BY TRANSDATE ON TABLE SET PAGE NOLEAD ON TABLE SET STYLE * GRID=OFF,$ ENDSTYLE END
The output is shown in the following image:
Note:
TRANSDATE is the independent variable (x) and QUANTITY is the dependent variable (y). The equation is used to calculate QUANTITY FORECAST trend and predicted values.
The following version of the request charts the data values and the regression line.
GRAPH FILE VIDEOTRK SUM QUANTITY COMPUTE FORTOT=FORECAST_LINEAR(MODEL_DATA,QUANTITY,1,3); BY TRANSDATE ON GRAPH PCHOLD FORMAT JSCHART ON GRAPH SET LOOKGRAPH VLINE END
The output is shown in the following image.