Workspace Node: Random Forest Classification - Results - Classification Tab

In the Random Forest Classification node dialog box, under the Results heading, select the Classification tab to access the following options.

Element Name Description
Sample Select an option button in this group box to specify for which type of sample to compute the predicted and residual statistics (classifications).
Analysis Only those observations that were used to compute the current results (i.e., build the current set of trees).
Test set All observations that were held out for use as a test set.
Prediction All cases that have valid data for the predictor variables, but with missing data for the dependent variable.
All samples Displays and plots classifications statistics for all the sets above in one spreadsheet or graph.
Predicted vs. observed by classes Select this check box to produce three spreadsheets for predicted values, classification, and confusion matrices. This action will also generate 3D histogram of the predicted by observed classification frequencies (i.e., number of observations).
Prior probabilities Select this check box to produce a spreadsheet containing the prior probabilities and the corresponding n for each class (group) in the dependent variable. The prior probabilities will be combined with the prediction probabilities and misclassification costs to compute the final classification probabilities (see the Technical Notes).
Adjusted prior probabilities Select this check box to produce a spreadsheet of classification priori probabilities for each class of the dependent variable, adjusted for the User-specified misclassification costs and the corresponding class n's. This spreadsheet will be identical to the one produced by the Prior probabilities option (see above) if the classification prior probabilities were not adjusted (modified) via the User specified option in the Prior probability group box on the Specifications - Classification tab.
Misclassification cost matrix Select this check box to produce a spreadsheet containing the (user-specified or default) costs of misclassifying cases or objects in each observed class of the dependent variable as another class. The misclassification costs are combined with the prior probabilities when computing the final classification probabilities (see also the Technical Notes).
Lift chart type The options in this group box are used to create lift charts and gains charts for the categories of the dependent variables and for the current model. Use these charts to evaluate and compare the utility of the model for predicting the different categories or classes for the categorical dependent variable. Select the option button that specifies the type of chart and the scaling for the chart you want to compute.
Gains chart Select this check box to produce a gains chart. This chart shows the percent of observations correctly classified into the chosen category (see Category of response below) when taking the top x percent of cases from the sorted (by classification probabilities) data file.

For example, this chart can show you that by taking the top 20 percent (shown on the x-axis) of cases classified into the respective category with the greatest certainty (maximum classification probability), you would correctly classify almost 80 percent of all cases (as shown on the vertical y-axis of the plot) belonging to that category in the population. In this plot, the baseline random classification (selection of cases) would yield a straight line (from the lower-left to the upper-right corner), which can serve as a comparison to gauge the utility of the respective models for classification.

Lift chart (response %) Select this check box to produce a lift chart where the vertical y-axis is scaled in terms of the percent of all cases belonging to the respective category. As in the gains chart, the x-axis denotes the respective top x percent of cases from the sorted (by classification probabilities) data file.
Lift chart (lift value) Select this check box to produce a lift chart where the vertical y-axis is scaled in terms of the lift value, expressed as the multiple of the baseline random selection model.

For example, this chart can show you that by taking the top 20 percent (shown on the x-axis) of cases classified into the respective category with the greatest certainty (maximum classification probability), you would end up with a sample that has almost 4 times as many cases belonging to the respective category when compared to the baseline random selection (classification) model

Category of response Gains and/or lift charts will be produced for all categories.
Cumulative Select this check box to show in the chosen lift and gains charts the cumulative percentages, lift values, etc. Clear this check box to show the simple (noncumulative) values.

Options / C. See Common Options.

OK Click the OK button to accept all the specifications made in the dialog box and to close it. The analysis results will be placed in the Reporting Documents node after running (updating) the project.