REGRESS_KNN: K-Nearest Neighbors Regression

K-nearest neighbors regression is a method for predicting a target value for a data point in the space spanned by the predictors. The prediction is the average of the target values of its K nearest neighbors. This method requires having a distance definition in this space.

Reference: Calculate a K-Nearest Neighbors Regression

REGRESS_KNN('options', 
        predictor_field1[, predictor_field2, ...] target_field)

where:

'options'

Is a dictionary of advanced parameters that control the model attributes, enclosed in single quotation marks. Most of these parameters have a default value, so you can omit them from the request, if you want to use the default values. Even with no advanced parameters, the single quotation marks are required. The format of the advanced parameter dictionary is:

'{"parm_name1": "parm_value1", ... ,"parm_namei": "parm_valuei"}'

The following advanced parameters are supported:

"K"

Is the number of nearest neighbors to participate in the prediction. Allowed values are integers greater than 1. The default value is "5".

"p"

Is the power (p) of the L^p-distance. Allowed values are integers. Suggested values are "1" and "2". The default value is "2".

  • power=1 calculates the distance as the sum of the absolute values of the differences between the coordinates (Manhattan distance).
  • power=2 calculates the distance as the square root of the sum of the squares of the differences between the coordinates (Euclidean distance).
"prediction_ratio"

Is the fraction of the data that will be used for prediction. Allowed values are numbers between 0 and 1. The default value is "0.8".

"test_ratio"

Is a value between 0 and 1 that specifies the fraction of data used for testing the model. The default value is "0.2".

"kfold"

Is the number of folds used in the grid-search with cross-validation. A grid-search of the nearest neighbors grid K/2, K, 2K is done. Suggested values are integers between "2" and "10". The default value is "4".

predictor_field1[, predictor_field2, ...]

Numeric

Are one or more predictor field names.

target_field

Numeric

Is the target field.

Example: Predicting Price Using REGRESS_KNN

The following request uses REGRESS_KNN to predict price using the default advanced parameters (10 nearest neighbors and Euclidean distance), with predictors height, horsepower, peak RPM, city MPG, and highway MPG.

TABLE FILE imports85
PRINT price 
COMPUTE predictedPrice/I5 = REGRESS_KNN('{"K":"10","p":"2","kfold":"4","prediction_ratio":"0.8","test_ratio":"0.2"}',
                 height, horsepower, peakRpm, 
                 cityMpg, highwayMpg, price);
WHERE price LT 30000
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

Partial output is shown in the following image.