K-nearest neighbors classification is a method for assigning a class membership to a data point in the space spanned by the predictors. The classification is done by assigning the class most common among its k nearest neighbors. This method requires having a distance definition in this space.
CLASSIFY_KNN('options', predictor_field1[, predictor_field2, ...] target_field)
where:
Is a dictionary of advanced parameters that control the model attributes, enclosed in single quotation marks. Most of these parameters have a default value, so you can omit them from the request, if you want to use the default values. Even with no advanced parameters, the single quotation marks are required. The format of the advanced parameter dictionary is:
'{"parm_name1": "parm_value1", ... ,"parm_namei": "parm_valuei"}'
The following advanced parameters are supported:
Is the number of nearest neighbors. Allowed values are integers greater than 1. The default value is "5".
Is the power (p) of the L^p-distance. Allowed values are positive integers. Suggested values are "1" and "2". The default value is "2".
Is the fraction of the data that will be used for prediction. Allowed values are numbers between 0 and 1. The default value is "0.8".
Is a value between 0 and 1 that specifies the fraction of data used for testing the model. The default value is "0.2".
Is the number of folds used in the grid-search with cross-validation. A grid-search of the nearest neighbors grid K/2, K, 2K is done. Suggested values are integers between "2" and "10". The default value is "4".
Numeric
Are one or more predictor field names.
Numeric
Is the target field.
The following request creates classes 0 (zero) and 1 based on the number of doors in a car, then uses CLASSIFY_KNN to predict the class assignment using the 10 nearest neighbors and Euclidean distance, with predictors price, height, horsepower, peak RPM, city MPG, and highway MPG. The data set includes rows with missing target values.
DEFINE FILE imports85 OUR_CLASSES/I2 MISSING ON = IF numOfDoors EQ MISSING THEN MISSING ELSE IF numOfDoors EQ 'two' THEN 0 ELSE 1; END TABLE FILE imports85 PRINT numOfDoors BODYSTYLE OUR_CLASSES COMPUTE predictedCLASSES/I2 = CLASSIFY_KNN('{"K":"10","p":"2","kfold":"4","test_ratio":"0.2"}', price, height, horsepower, peakRpm, cityMpg, highwayMpg, OUR_CLASSES); ON TABLE SET PAGE NOPAGE ON TABLE SET STYLE * GRID=OFF,$ ENDSTYLE END
Partial output is shown in the following image.