C&RT Quick Specs - Stopping Tab
The size of the tree is an important issue in computing Classification and Regression Trees. You don't want the tree to grow to an undesirable size, which could make the interpretation of results difficult. You can keep a check on the size of the tree by using the options available on the Stopping tab of the C&RT Quick specs dialog box. This tab contains two group boxes: Stopping rule and Stopping parameters, which contain options to choose a criterion for selecting the right-sized tree. Note these options are also available on the Stopping tab of the ITREES C&RT Extended Options dialog box.
- Stopping rule
- If the dependent variable for the current analysis is categorical in nature, and the objective of the analysis is to classify cases (observations) into the categories defined in the dependent variable (i.e., if the Categorical response check box is selected on the Quick tab), this box will contain three stopping rules: Prune on misclassification error, Prune on deviance, and FACT-style direct stopping. If the dependent variable is continuous, two stopping rules are available: Prune on variance and FACT-style direct stopping. Refer to Computational Details for details concerning these stopping rules; see also Ripley (1996) for detailed discussions of these measures.
- Prune on variance
- One of the ways in which the size of the tree can be checked is by pruning the tree, i.e., by removing parts of trees with the aim of computing the right-sized tree. If the dependent variable is continuous (regression), the measure used is the variance of cases in a node. Select the Prune on variance option button to prune on the basis of variance.
- Prune on misclassification error
- This option uses costs that equal the misclassification rate when priors are estimated and misclassification costs are equal. Select the Prune on misclassification error option button to prune on the basis of misclassification error.
- Prune on deviance
- Deviance is a measure of fit that is based on the likelihood principle. This option will use the difference between the log-likelihood of the best model and the current model as a basis for pruning when the dependent variable is categorical (see Ripley, 1996). Select the Prune on deviance option button to prune the trees on the basis of deviance.
- FACT-style direct stopping
- Select this option to directly stop the growth of the tree based upon a fraction of cases (regression) or a fraction of cases within a specific category of the response (classification). This is in contrast to the other stopping rules, which all involve pruning a tree, that is, growing the tree too large and then pruning the tree until only the root node remains. This process of pruning creates a sequence of trees that vary in size from the largest tree grown to the root node. By using v-fold crossvalidation coupled with the standard error rule, STATISTICA will select the optimal tree in this sequence. With the FACT-style direct stopping method, the process of pruning is completely omitted and the growth of the tree is based solely on the fraction of objects option (see below).
- Stopping parameters
- The process of computing the tree can also be controlled through other parameters, such as the number of cases, the number of nodes, and the standard error. The Stopping parameters group box provides the following choices: Minimum n cases and Maximum n nodes. An additional Fraction of objects option is available when the Categorical response check box is selected on the Quick tab. Use these options to control when split selection stops and, if a pruning method is selected as the Stopping rule, when pruning begins and which pruned tree is selected as the right-sized tree. Two additional parameters are available via the Interactive Trees module (Minimum n in child node and Minimum n levels, see below).
- Minimum n cases
- If a pruning method is selected in the Stopping rule group box, i.e., Prune on misclassification error or Prune on deviance, enter a value for the Minimum n cases. If the number of observations within the node is less than this value, the node will not be considered for splitting.
- Minimum n in child node
- If a pruning method is selected in the Stopping rule group box, i.e., Prune on misclassification error or Prune on deviance, use this option to control the smallest permissible number in a child node, for a split to be applied. While the Minimum n parameter determines whether an additional split is considered at any particular node, the Minimum n in child node parameter determines whether a split will be applied, depending on whether any of the two resultant child nodes will be smaller (have fewer cases) than n as specified via this option. Note this option is only available on the ITREES C&RT Extended Options dialog box.
- Fraction of objects
- If FACT-style direct stopping is selected as the Stopping rule (see above), the value in the Fraction of objects box is used to stop the growth of the tree. For classification problems, a node will not be split if any of the relative frequencies of the levels of the categorical response fall at or below the value in the Fraction of objects box. For regression problems, a node will not be split if the relative frequency within the node falls at or below the value in the Fraction of objects box.
- Maximum n levels
- Use this option to specify the maximum number (n) of levels in the tree. Note this option is only available on the ITREES C&RT Extended Options dialog box.
- Maximum n nodes
- The value supplied in the Maximum n nodes box will be used for stopping on the basis of the number of nodes in the classification tree. Each time a parent node is split, the total number of nodes in the tree is examined, and the splitting is stopped if this number exceeds the number specified in Maximum n nodes box.
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.