Random Forest Results - Classification Tab
Select the Classification tab of the Random Forest Results dialog box to access options for reviewing plots and spreadsheets of observed and predicted classifications for each observation set (Training, Test, Missing Data, and All samples sets). There are also options to provide information about the prior probabilities used in the analysis. The Classification tab is not available if the current analysis is a Regression Analysis, as specified on the Random Forest Startup Panel - Quick tab.
- Sample
- Select an option button in the Sample group box to specify for which type of sample to compute the predicted and residual statistics (classifications).
- Analysis
- Only those observations that were used to compute the current results (i.e., build the current set of trees).
- Test set
- All observations that were held out for use as a test set.
- Prediction
- All cases that have valid data for the predictor variables, but with missing data for the dependent variable.
- All samples
- Displays and plots classifications statistics for all the sets above in one spreadsheet or graph.
- Predicted vs. observed by classes.
- Click the Predicted vs. observed by classes button to produce three spreadsheets for predicted values, classification, and confusion matrices. This action will also generate 3D histogram of the predicted by observed classification frequencies (i.e., number of observations).
- Prior probabilities
- Click this button to display a spreadsheet containing the prior probabilities and the corresponding n for each class (group) in the dependent variable. The prior probabilities will be combined with the prediction probabilities and misclassification costs to compute the final classification probabilities (see the Technical Notes).
- Adjusted prior probabilities
- Click this button to display a spreadsheet of classification priori probabilities for each class of the dependent variable, adjusted for the User-specified misclassification costs (see the description of the Random Forest Specifications dialog box - Classification tab) and the corresponding class n's. This spreadsheet will be identical to the one produced by the Prior probabilities option (see above) if the classification prior probabilities were not adjusted (modified) via the User specified option in the Prior probability group box on the Random Forest Specification dialog box – Advanced tab.
- Misclassification cost matrix
- Click the Misclassification cost matrix button to display a spreadsheet containing the (user-specified or default) costs of misclassifying cases or objects in each observed class of the dependent variable as another class (rows; see also the documentation for the Random Forest Specifications dialog box - Classification tab). The misclassification costs are combined with the prior probabilities when computing the final classification probabilities (see also the Technical Notes).
- Lift chart type
- The options in this group box are used to create lift charts and gains charts for the categories of the dependent variables and for the current model. Use these charts to evaluate and compare the utility of the model for predicting the different categories or classes for the categorical dependent variable. Select the option button that specifies the type of chart and the scaling for the chart you want to compute.
- Gains chart
- Select this option button to compute a gains chart. This chart shows the percent of observations correctly classified into the chosen category (see Category of response below) when taking the top x percent of cases from the sorted (by classification probabilities) data file.
For example, this chart can show you that by taking the top 20 percent (shown on the x-axis) of cases classified into the respective category with the greatest certainty (maximum classification probability), you would correctly classify almost 80 percent of all cases (as shown on the vertical y-axis of the plot) belonging to that category in the population. In this plot, the baseline random classification (selection of cases) would yield a straight line (from the lower-left to the upper-right corner), which can serve as a comparison to gauge the utility of the respective models for classification.
- Lift chart (response %)
- Select this option button to compute a lift chart where the vertical y-axis is scaled in terms of the percent of all cases belonging to the respective category. As in the gains chart, the x-axis denotes the respective top x percent of cases from the sorted (by classification probabilities) data file.
- Lift chart (lift value)
- Select this option button to compute a lift chart where the vertical y-axis is scaled in terms of the lift value, expressed as the multiple of the baseline random selection model.
For example, this chart can show you that by taking the top 20 percent (shown on the x-axis) of cases classified into the respective category with the greatest certainty (maximum classification probability), you would end up with a sample that has almost 4 times as many cases belonging to the respective category when compared to the baseline random selection (classification) model
- Category of response
- Select the response category for which to compute the gains and/or lift charts. You can chose to produce lift charts for a single or all categories.
- Cumulative
- Select this check box to show in the chosen lift and gains charts the cumulative percentages, lift values, etc. Clear this check box to show the simple (noncumulative) values.
- Lift chart
- Click this button to create the chart as specified via the Lift chart type and Cumulative lift chart options (see above).