Exhaustive (C and RT) Search for Univariate Splits

Implementation of C and RT (Classification and Regression Trees) style search for univariate splits, based on the QUEST (Quick, Unbiased, Efficient Statistical Trees) algorithms for classification. See also the designated C and RT (and CHAID) methods available as part of Data Miner for more comprehensive implementations of these techniques.

Element Name Description
General
Detail of computed results reported Specifies the detail of computed results reported. If Minimal results is requested, STATISTICA will only report the final classification tree plot and summary tree structure; if Comprehensive results is requested, then STATISTICA will also report various other results tables; select All results to review the predicted classifications, separately for the learning and testing samples if an Analysis sample variable (learning/testing variable) is specified for the analysis.
Goodness of fit measure Specifies goodness-of-fit measure for classification; the C&RT-style exhaustive search for univariate splits works by searching for the split that maximizes the reduction in the value of the selected goodness of fit measure. When the fit is perfect, classification is perfect.
Prior class probabilities Specifies how to determine the a-priori classification probabilities.
Stopping option for pruning Specifies the stopping rule for the pruning computations.
Minimum n per node Specifies a minimum n-per-node, when pruning should begin; this value controls when split selection stops and pruning begins.
Standard error rule Standard error rule for finding optimal trees via V-fold cross-validation; refer to the Electronic Manual for additional details.
Fraction of objects Fraction of object for FACT-style direct stopping.
V-Fold Cross-validation
Number of folds (V) Number of folds (sets, random samples V) for V-fold cross-validation.
Random number seed Random number seed for V-fold cross-validation (for generating the random samples).
Global cross-validation (GCV) Performs global cross-validation; in global cross-validation, the entire analysis is replicated a specified number of times (3 is the default) holding out a fraction of the learning sample equal to 1 over the specified number of times, and using each hold-out sample in turn as a test sample to cross-validate the selected classification tree. This type of cross-validation is probably no more useful than V-fold cross-validation when FACT-style direct stopping is used, but can be quite useful as a method validation procedure when automatic tree selection techniques are used (for discussion, see Breiman et. al., 1984).
V-fold for GCV Number of folds (sets, random samples V) for global cross-validation.