How to: |
Reference: |
The REGRESS method derives a linear equation that best fits a set of numeric data points, and uses this equation to create a new column in the report output. The equation can be based on one to three independent variables.
This method estimates values by assuming that the dependent variable (y, the new calculated values) and the independent variables (x1, x2, x3) are related by the following linear equation:
y = a1*x1 [+ a2*x2 [+ a3*x3]] + b
When there is one independent variable, the equation represents a straight line. This produces the same values as FORECAST using the REGRESS method. When there are two independent variables, the equation represents a plane, and with three independent variables, it represents a hyperplane. You should use this technique when you have reason to believe that the dependent variable can be approximated by a linear combination of the independent variables.
REGRESS uses a technique called Ordinary Least Squares to calculate values for the coefficients (a1, a2, a3, and b) that minimize the sum of the squared differences between the data and the resulting line, plane, or hyperplane.
ON {sortfield} RECAP y[/fmt] = REGRESS(n, x1, [x2, [x3,]] z);
where:
Is a field in the data source. It cannot be the same field as any of the parameters to REGRESS. A new linear regression equation is derived each time the sort field value changes.
Is the new numeric column calculated by applying the regression equation. You cannot DEFINE or COMPUTE a field with this name.
Is the display format for y. If it is omitted, the default format is D12.2.
Is a whole number from 1 to 3 indicating the number of independent variables.
Are the field names to be used as the independent variables. All of these variables must be numeric and be independent of each other.
Is an existing numeric field that is assumed to be approximately linearly dependent on the independent variables and is used to derive the regression equation.
The following request uses the GGSALES data source to calculate an estimated DOLLARS column. The BUDUNITS, UNITS, and BUDDOLLARS fields are the independent variables. The DOLLARS field provides the actual values to be estimated:
DEFINE FILE GGSALES YEAR/Y = DATE; MONTH/M = DATE; PERIOD/I2 = MONTH; END
TABLE FILE GGSALES PRINT BUDUNITS UNITS BUDDOLLARS DOLLARS BY PERIOD ON PERIOD RECAP EST_DOLLARS/F8 = REGRESS(3, BUDUNITS, UNITS, BUDDOLLARS, DOLLARS); WHERE CATEGORY EQ 'Coffee' WHERE REGION EQ 'West' WHERE UNITS GT 1600 AND UNITS LT 1700 END
The output is: