REGRESS_POLY: Polynomial Regression

Polynomial regression fits the target column to a polynomial expression of the predictor columns. The degree of the polynomial is specified as an input argument to the function.

Reference: Calculate a Polynomial Regression Column

REGRESS_POLY('options', 
        predictor_field1, predictor_field2, [...,] target_field)

where:

'options'

Is a dictionary of advanced parameters that control the model attributes, enclosed in single quotation marks. Most of these parameters have a default value, so you can omit them from the request, if you want to use the default values. Even with no advanced parameters, the single quotation marks are required. The format of the advanced parameter dictionary is:

'{"parm_name1": "parm_value1", ... ,"parm_namei": "parm_valuei"}'

The following advanced parameters are supported:

"degree"
Optional. Is the degree of the polynomial. Low degree polynomials are recommended. The default value is "1".
"interaction_only"

Optional. Controls the terms that are generated in the polynomial equation. The default value is "no". Allowed values are:

  • "no", which generates the most general polynomial of degree degree based on the predictor fields.
  • "yes", which uses only terms linear in each predictor X0, X1, . . ., and cross-terms of the form X0*X1*X2 of at most degree predictors.
"train_ratio"

Optional. Is a value between 0 and 1 that specifies the fraction of data used for training the model. The default value is "0.8".

"test_ratio"

Optional. Is a value between 0 and 1 that specifies the fraction of data used for testing the model. The default value is "0.2".

"l2_grid"

Optional. Is a grid consisting of comma-separated positive numbers to be used as L2-regularization strengths. The default value is "0,1,1,10". The optimal value is chosen by cross-validation.

"kfold"

Optional. Is the number of folds used for cross-validation. Suggested values are integers between "2" and "10". The default value is "4".

predictor_field1, predictor_field2, [...,]

Numeric

Are at least two predictor field names.

target_field

Numeric

Is the target field.

Example: Using REGRESS_POLY to Predict Price

The following request uses REGRESS_POLY to compute the predicted price using a polynomial regression of degree 4 and predictors height, horsepower, peak RPM, city MPG, and highway MPG.

TABLE FILE imports85
PRINT price
COMPUTE predictedPrice/I5 = REGRESS_POLY('{"degree":"4"}',
                 height, horsepower, peakRpm,
                 cityMpg, highwayMpg, price);
WHERE price LT 30000
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The partial output is shown in the following image.