Classification Trees - Notes on Computational Algorithms

Discriminant-based splits

The algorithms used to perform discriminant-based splits are adapted from QUEST (Quick, Unbiased, Efficient Statistical Trees). QUEST is a classification tree program developed by Loh and Shih (1997) that employs a modification of recursive quadratic discriminant analysis and includes a number of innovative features for improving the reliability and efficiency of the classification trees that it computes.

For univariate splits, QUEST uses statistical tests to select the variables to perform splits. To find the splits, the 2-means clustering algorithm of Hartigan and Wong (1979, see also Cluster Analysis), singular value decomposition methods (Schneider & Barker, 1973), and recursive quadratic discriminant analysis techniques (McLachlan, 1992) are used. Similar algorithms are used for linear combination splits. Loh and Shih (1997) provide detailed descriptions of the algorithms used in QUEST.

C & RT-style search for univariate splits

The C & RT-style univariate split selection method is an adaption of the algorithms used in C & RT, as described by Breiman et al. (1984). C & RT (Classification And Regression Trees) is a classification tree program that uses an exhaustive grid search of all possible univariate splits to find the splits for a classification tree. For categorical predictor variables with k levels present at a node, there are 2(k -1) - 1 possible contrasts between two sets of levels of the predictor. For ordered predictors with k distinct levels present at a node, there are k -1 midpoints between distinct levels. Thus it can be seen that the number of possible splits that must be examined can become very large when there are large numbers of predictors with many levels which must be examined at many nodes. The C & RT-style exhaustive search for univariate splits option works by searching for the split that maximizes the reduction in the value of the selected goodness of fit measure. C&RT split selections methods are thoroughly documented in Breiman et. al. (1984).

Direct stopping

The FACT-style direct stopping rule option employs techniques used in FACT. FACT is a classification tree program developed by Loh and Vanichestakul (1988) that is a precursor of QUEST. The use of direct stopping as a stopping rule for classification tree is discussed in Loh and Vanichestakul (1988).

Minimal cost-complexity cross-validation pruning

The pruning stopping rule options employ techniques used in C & RT, as described in Breiman et al. (1984).