CLASSIFY_RF: Random Forest Classification

How to:

CLASSIFY_RF creates a random forest, which is an ensemble of decision trees. Each decision tree produces an independent classification prediction, and the prediction of the forest is the majority vote of the individual predictions.

Syntax: How to Calculate a Random Forest Classification

CLASSIFY_RF('options', number_of_trees, 
         predictor_field1[, predictor_field2, ...] target_field)

where:

'options'

Is a dictionary of advanced parameters that control the model attributes, enclosed in single quotation marks. Most of these parameters have a default value, so you can omit them from the request, if you want to use the default values. Even with no advanced parameters, the single quotation marks are required. The format of the advanced parameter dictionary is:

'{"parm_name1": "parm_value1", ... ,"parm_namei": "parm_valuei"}'

The following advanced parameters are supported:

"trees"

Is the number of decision trees in the forest. Allowed values are integers greater than 10. The default value is "100".

"feature_importances"

Specifies whether to compute feature importances. Valid values are "yes" and "no". The default value is "yes".

"train_ratio"

Is a value between 0 and 1 that specifies the fraction of data used for training the model. The default value is "0.8".

"test_ratio"

Is a value between 0 and 1 that specifies the fraction of data used for testing the model. The default value is "0.2".

"scoring"
The training optimizes either for F1 Score (weighted average of Precision and Recall) or accuracy (ratio of correctly predicted observations to total observations). Allowed values are "f1_score" and "accuracy". The default value is "f1_score".
"min_values_leaf_grid"

Is a grid of the minimum number of samples required in a node in order to split the node, or a single value. The optimal value is chosen by cross-validation. The default value is "1,3,5".

predictor_field1[, predictor_field2, ...]

Numeric or alphanumeric

Are one or more predictor field names.

target_field

Numeric or alphanumeric

Is the target field.

Example: Predicting Class Assignment Using CLASSIFY_RF

The following procedure uses CLASSIFY_RF to predict a class assignment for the number of doors in a car, using a random forest with 100 decision trees, with predictors price, body style, height, horsepower, peak RPM, city MPG, highway MPG.

TABLE FILE imports85
PRINT numOfDoors 
COMPUTE predictedNumOfDoors/A6 = CLASSIFY_RF('{"trees":"100","kfold":"4","test_ratio":"0.2"}',
                        price, bodyStyle, height, horsepower, 
                        peakRpm, cityMpg, highwayMpg, numOfDoors);
WHERE price LT 30000
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

Partial output is shown in the following image.