ITrees Extended Options - Classification Tab
The Classification tab of the ITrees Extended Options dialog box contains one or more group boxes with options, depending on the Model building method chosen on the Interactive Trees Startup Panel - Quick tab. The Misclassification costs group box contains options to choose the misclassification cost (Equal or User specified). The Goodness of fit measure (Gini measure, Chi-square, or G-square) and the Prior probabilities (Estimated, Equal, or User specified) options apply only to classification trees analyses, i.e., if C&RT is chosen as the Model building method on the Interactive Trees Startup Panel - Quick tab. Note that these options are available only if the dependent (criterion) variable in the current analysis is categorical (when Classification Analysis is chosen as the Type of analysis on the Interactive Trees Startup Panel - Quick tab, i.e., if the goal of the current analysis is to correctly classify cases (observations) into the groups specified in the dependent variable.
Element Name | Description |
---|---|
Misclassification costs | Note that misclassification costs are not used in the computations of the CHAID tree. The options in this group box enable you to assign greater importance to the accurate prediction (classification) for some classes as compared to others. For example, in medical research you may want to assign greater importance to the accurate classification of malignant tumors as compared to accurate discrimination between different types of benign forms. In this case, you would assign greater costs to the misclassification of malignant tumors, and lower costs to the misclassification of benign tumors. Note also that, as illustrated in this example, the matrix of misclassification costs does not necessarily have to be symmetric, and in fact it rarely is (i.e., it is more costly to misclassify malignant tumors as benign than to misclassify benign tumors as malignant). |
Equal | If you select the Equal option button, each off-diagonal element of the predicted class (row) by observed class (column) misclassification costs matrix is set equal to 1.0, and the prior probabilities specified for the classes on the dependent variable are not adjusted. |
User specified (Not available for CHAID analyses.) | Select the User specified option button if more accurate classification is desired for some classes than others. Note that this option is available only after you have selected the dependent variable codes (classes) by clicking the Response codes button on the Quick tab. |
Misclassification costs in CHAID and Exhaustive CHAID | Unequal misclassification costs will only affect the automatic tree growing process when C&RT (classification trees analysis) is selected on the Interactive Trees Startup Panel - Quick tab. For CHAID and Exhaustive CHAID, user-defined (unequal costs) will only be used for computing the final misclassification costs. See also the General Classification and Regression Trees (GC&RT) and General CHAID Models modules for details. |
Goodness of fit | The options in this group box pertain to the goodness of fit measure that is used as a criterion for selecting the best split from the set of possible candidate splits. You can choose one of the three measures: Gini measure, Chi-square, and G-square. See also the General Classification and Regression Trees (GC&RT) Introductory Overviews, and Breiman et al.(1984) for details concerning these measures. |
Gini measure | The Gini measure (see also GC&RT Computational Formulas) is a measure of impurity of a node and can be used as a measure of goodness of fit to compute the "right-sized" tree. With priors estimated from class sizes and equal misclassification costs, the Gini measure is computed as the sum of products of all pairs of class proportions for classes present at the node. This measure reaches its maximum value when class sizes at the node are equal, and reaches a value of zero when only one class is present at a node (and, hence, when the classification for the observed data is perfect). The Gini measure is the commonly preferred measure of goodness of fit (e.g., Breiman et. al.,1984). |
Chi-square | The Chi-square option is similar to the standard Chi-square value computed for the expected and observed classifications (with priors adjusted for misclassification cost). |
G-square | The G-square option is similar to the maximum-likelihood Chi-square (as for example computed in the Log Linear module). |
Prior probabilities | The options in this group box are used to specify how likely it is, without using any prior knowledge of the values for the predictor variables in the model, that a case or object will fall into one of the classes. The Prior probabilities group box contains three options for this purpose: Estimated, Equal, and User specified. Note that the User specified option is available only after you have selected the specific Response codes for the dependent variable on the Quick tab of the Specifications dialog box. Note that the specification of equal or unequal prior probabilities can greatly affect the accuracy of the final tree model for predicting particular classes. For details, see Prior Probabilities, the Gini Measure of Node Impurity, and Misclassification Cost. |
Estimated | Select the Estimated option button to specify that the likelihood that a case or object will fall into one of the classes is proportional to the dependent variable class sizes. See also the descriptions of these options for the Classification Trees Analysis module for additional details. |
Equal | Select the Equal option button to specify that the likelihood that a case or object will fall into one of the classes is the same for all dependent variable classes. See also the descriptions of these options for the Classification Trees Analysis module for details. |
User specified | Select the User specified option button if you have specific knowledge about the base rates (for example, based on previous research). When you select the User specified option button, the Enter values for the prior probabilities dialog box will be displayed, in which you can specify the a priori probabilities for each class of the dependent variable. This dialog box is automatically displayed only the first time that priors are set to user defined (i.e., the User specified option button is selected); thereafter, click the accompanying settings button to display the dialog containing the previously specified values. If the probabilities do not add up to 1.0, STATISTICA will automatically adjust them proportionately. |