Classification from Categorical and Ordered Predictors

Complete implementation of the discriminant-based univariate splits method of QUEST (Quick, Unbiased, Efficient Statistical Trees) for classification from categorical and ordered predictors. QUEST is a classification tree algorithm developed by Loh and Shih (1997) that employs a modification of recursive quadratic discriminant analysis and includes a number of innovative features for improving the reliability and efficiency of the classification trees that it computes. The program allows you to specify categorical and continuous (ordered) predictors, and learning and testing samples; if a testing sample is specified, predicted classifications can be computed based on the final classification solution. See also the General C&RT methods and General CHAID methods for tree classifiers with large data sets (with many classes) and ANCOVA-like predictor designs.

General

Element Name Description
Detail of computed results reported Specifies the detail of computed results reported. If Minimal results is requested, the program will only report the final classification tree plot and summary tree structure; if Comprehensive results is requested, then the program will also report various other results tables; select All results to review the predicted classifications, separately for the learning and testing samples if an Analysis sample variable (learning/testing variable) was specified for the analysis.
Prior class probabilities Specifies how to determine the a-priori classification probabilities.
Stopping option for pruning Specifies the stopping rule for the pruning computations.
Minimum n per node Specifies a minimum n-per-node, when pruning should begin; this value controls when split selection stops and pruning begins.
Standard error rule Standard error rule for finding optimal trees via V-fold cross-validation; refer to the Electronic Manual for additional details.
Fraction of objects Fraction of object for FACT-style direct stopping.

V-Fold Cross-validation

Element Name Description
Number of folds (V) Number of folds (sets, random samples V) for V-fold cross-validation.
Random number seed Random number seed for V-fold cross-validation (for generating the random samples).
P-Value For Split The value entered in the p-value for split variable selection box is used in the split variable selection process when a Discriminant-based split selection method has been selected. The p-value is used to determine whether the significance of Levenes F (a statistical test that is robust to violations of the distributional assumptions for ANOVA) or the significance of a standard univariate F is used as the criterion for split variable selection.
Global cross-validation (GCV) Performs global cross-validation; in global cross-validation, the entire analysis is replicated a specified number of times (3 is the default) holding out a fraction of the learning sample equal to 1 over the specified number of times, and using each hold-out sample in turn as a test sample to cross-validate the selected classification tree. This type of cross-validation is probably no more useful than V-fold cross-validation when FACT-style direct stopping is used, but can be quite useful as a method validation procedure when automatic tree selection techniques are used (for discussion, see Breiman et. al., 1984).
V-fold for GCV Number of folds (sets, random samples V) for global cross-validation.