Random Forest Specifications - Classification Tab

Select the Classification tab of the Random Forest Specifications dialog box to access the options described here. Note that this tab is available only if the dependent variable in the current analysis is categorical, (if Classification Analysis was selected as the Type of analysis on the Random Forest Startup Panel - Quick tab -- if the goal of the current analysis is to correctly classify cases into the groups specified in the dependent variable). In order to specify user-defined priors, you first have to select the specific dependent variables codes defining the number and names of classes in the analysis.

Misclassification costs
The options in this group box enable you to assign greater importance to classification (misclassification) of certain classes as compared to others. For example, in medical diagnostics you may want to assign a greater cost for misclassifying malignant tissues as healthy compared to misclassifying healthy tissues as malignant. This implies that the misclassification cost matrix does not have to be symmetric, and in fact it rarely is.

The misclassification costs are combined with the prior probabilities when computing the classification probabilities (during estimation, as well as when computing the final classification probabilities); for details regarding these computations, see Friedman (1999a, p. 11; essentially, the cost will be applied as a relative weight to the classification probabilities, and final classifications will be determined from the products of the two).

Equal
If you select the Equal option button, each off-diagonal element of the predicted class (row) by observed class (column) misclassification costs matrix is set equal to 1.0, and the specified prior probabilities for the classes on the dependent variable are not adjusted.
User spec
Select the User spec. (User specified) option button if more accurate classification is desired for some classes than others. Note that this option is available only if you have selected the dependent variable codes by clicking the Response codes button on the Quick tab. See also the description of these options in the Classification Trees module for more details.
Prior probabilities
Use the options in this group box to specify how likely it is, without using any knowledge derived from data of the values for the predictor variables in the model, that a case or object will fall into one of the classes. The Prior probabilities group box contains three options for this purpose: Estimated, Equal, and User specified. Note that the User specified option is available only after you have selected the specific Response codes for the dependent variable on the Quick tab of the Specification dialog box. The prior probabilities will be combined with the prediction probabilities and misclassification costs to compute the classification probabilities during estimation (building of trees), and to compute the final classifications (see also, Technical Notes).
Estimated
Select the Estimated option button to specify that the likelihood that a case or object will fall into one of the classes is proportional to the dependent variable class sizes (see example below).
Equal
Select the Equal option button to specify that the likelihood that a case or object will fall into one of the classes is the same for all dependent variable classes (see example below).
Example
These two options are best explained with an example. In an educational study of high school dropouts, for instance, it may happen that, overall, there are fewer dropouts than there are students who stay in school (there are different base rates); thus, the a priori probability that a student drops out is lower than that a student remains in school. The a priori probabilities can greatly affect the classification of cases or objects. If differential base rates are not of interest for the study, or if you know that there are about an equal number of cases in each class, then you could set the a priori probabilities to be Equal. If the differential base rates are reflected in the class sizes (as they would be if the sample is a probability sample), set the a priori probabilities to Estimated.
User specified
Select the User specified option button if you have specific knowledge about the base rates (for example, based on previous research). Click the button adjacent to the User specified option button to display the Enter values for the prior probabilities dialog box, in which you can specify the a priori probabilities for each class of the dependent variable. If the probabilities do not add up to 1.0, Statistica will automatically adjust them proportionately.