Workspace Node: C&RT Classification - Specifications - Classification Tab
In the C&RT Classification node dialog box, under the Specifications heading, select the Classification tab to access the following options.
Element Name | Description |
---|---|
Misclassification costs | Use the options in this group box to assign greater importance to the accurate prediction (classification) for some classes as compared to others. For example, in medical research you may want to assign greater importance to the accurate classification of malignant tumors as compared to accurate discrimination between different types of benign forms. In this case, you would assign greater costs to the misclassification of malignant tumors, and lower costs to the misclassification of benign tumors. Note also that, as illustrated in this example, the matrix of misclassification costs does not necessarily have to be symmetric, and in fact it rarely is (i.e., it is more costly to misclassify malignant tumors as benign than to misclassify benign tumors as malignant). |
Equal | If you select this button, each off-diagonal element of the predicted class (row) by observed class (column) misclassification costs matrix is set equal to 1.0, and the prior probabilities specified for the classes on the dependent variable are not adjusted. |
User specified | Select this option button if more accurate classification is desired for some classes than others. Note that this option is available only after you have selected the dependent variable codes (classes) by clicking the Response codes button on the Quick tab. |
Goodness of fit | The options in this group box pertain to the goodness of fit measure that is used as a criterion for selecting the best split from the set of possible candidate splits. You can choose one of the three measures: Gini measure, Chi-square, and G-square. See also the General Classification and Regression Trees (GC&RT) Introductory Overviews and Breiman et al.(1984) for details concerning these measures. |
Gini measure | The Gini measure (see also GC&RT Computational Formulas) is a measure of impurity of a node and can be used as a measure of goodness of fit to compute the right-sized tree. With priors estimated from class sizes and equal misclassification costs, the Gini measure is computed as the sum of products of all pairs of class proportions for classes present at the node. This measure reaches its maximum value when class sizes at the node are equal, and reaches a value of zero when only one class is present at a node (and, hence, when the classification for the observed data is perfect). The Gini measure is the commonly preferred measure of goodness of fit (e.g., Breiman et. al.,1984). |
Chi-square | The Chi-square option is similar to the standard Chi-square value computed for the expected and observed classifications (with priors adjusted for misclassification cost). |
G-square | The G-square option is similar to the maximum-likelihood Chi-square (as, for example, computed in the Log Linear module). |
Prior probabilities | The options in this group box are used to specify how likely it is, without using any prior knowledge of the values for the predictor variables in the model, that a case or object will fall into one of the classes. The group box contains three options for this purpose: Estimated, Equal, and User specified. Note that the User specified option is available only after you have selected the specific Response codes for the dependent variable on the Quick tab. Note that the specification of equal or unequal prior probabilities can greatly affect the accuracy of the final tree model for predicting particular classes. For details, see Prior Probabilities, the Gini Measure of Node Impurity, and Misclassification Cost. |
Estimated | Select this option button to specify that the likelihood that a case or object will fall into one of the classes is proportional to the dependent variable class sizes. See also the descriptions of these options for Classification Trees Analysis. module for additional details. |
Equal | Select this option button to specify that the likelihood that a case or object will fall into one of the classes is the same for all dependent variable classes. |
User specified | Select this option button if you have specific knowledge about the base rates (for example, based on previous research). When you click the
button, the
Enter values for the prior probabilities dialog box will be displayed, in which you can specify the
a priori probabilities for each class of the dependent variable. If the probabilities do not add up to 1.0, Statistica automatically adjusts them proportionately.
Options / C / W. See Common Options. |
OK | Click the OK button to accept all the specifications made in the dialog box and to close it. The analysis results will be placed in the Reporting Documents node after running (updating) the project. |
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.