CLASSIFY_BLR: Binary Logistic Regression

How to:

Binary logistic regression finds the best linear separation between two classes in the space spanned by the predictors. The target variable has two values (0 and 1), corresponding to the two classes. The predicted value is either a class assignment (0 or 1) or the probability of belonging to class 1.

Syntax: How to Calculate a Binary Logistic Regression

CLASSIFY_BLR('options',  
         predictor_field1[, predictor_field2, ...] target_field)

where:

'options'

Is a dictionary of advanced parameters that control the model attributes, enclosed in single quotation marks. Most of these parameters have a default value, so you can omit them from the request, if you want to use the default values. Even with no advanced parameters, the single quotation marks are required. The format of the advanced parameter dictionary is:

'{"parm_name1": "parm_value1", ... ,"parm_namei": "parm_valuei"}'

The following advanced parameters are supported:

"proba"

Indicates whether to display a probability of being in the target class or a class prediction. Valid values are "no" (the default) or "yes". The value "yes" produces a probability Y (0<=Y<=1) for belonging to the class specified by "pos_label" (default is "1") for each point in the space. The value "no" produces a class-prediction Y (0 or 1) for each point in the space.

"pos_label"

Is relevant when "proba" is "yes". Specifies the class for which the probability of belonging is to be computed. Valid values are "0" and "1". The default is "1".

"train_ratio"

Is a value between 0 and 1 that specifies the fraction of data used for training the model. The default value is "0.8".

"test_ratio"

Is a value between 0 and 1 that specifies the fraction of data used for testing the model. The default value is "0.2".

"l2_grid"

Is a grid consisting of comma-separated positive numbers to be used as L2-regularization strengths. The default value is "0,1,1,10". The optimal value is chosen by cross-validation.

"kfold"

Is the number of folds used for cross-validation. Suggested values are integers between "2" and "10". The default value is "4".

predictor_field1[, predictor_field2, ...]

Numeric

Are one or more predictor field names.

target_field

Numeric

Is the target field.

Example: Using CLASSIFY_BLR to Predict Number of Doors

The following TABLE request uses CLASSIFY_BLR to compute the number of doors in a car, using the default values for the advanced parameters and predictors price, height, horsepower, peak RPM, city MPG, and highway MPG.

TABLE FILE imports85
PRINT numOfDoors
COMPUTE predictedNumOfDoors/A6 = CLASSIFY_BLR('{"proba":"no","kfold":"4","test_ratio":"0.2"}',
                   price, height, horsepower, peakRpm, 
                   cityMpg, highwayMpg, numOfDoors);
WHERE price LT 30000
ON TABLE SET PAGE NOPAGE
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The partial output is shown in the following image.