Advanced Classification Trees (C And RT)

Creates standard classification trees (C&RT) for continuous and categorical predictors; builds an optimal tree structure to predict categorical dependent variables via V-fold cross-validation (optional). Various observational statistics (predicted classifications) can be requested as an option.

 NOTE: Unlike the options in General C&RT, these options are optimized to work for large and very large data files, and with predictor variables that can potentially be sparse. Specifically, these options will use the computational algorithms of the Statistica Interactive Trees module, to evaluate predictors one-by-one regardless of missing data patterns in other predictors.

General

Element Name Description
Detail of computed results reported Detail of computed results; if Minimal results is requested, then only the final tree will be displayed; if Comprehensive detail is requested, then various other statistical summaries are reported as well; if All results is requested, then various node statistics and graphs are reported as well. Note that observational statistics (predicted classifications) are available as an option.
Goodness of fit measure Specifies goodness-of-fit measure for classification; the C&RT-style exhaustive search for univariate splits works by searching for the split that maximizes the reduction in the value of the selected goodness of fit measure. When the fit is perfect, classification is perfect.
Prior class probabilities Specifies how to determine the a-priori classification probabilities.
Stopping option for pruning Specifies the stopping rule for the pruning computations.
Minimum n per node Specifies a minimum n-per-node, when pruning should begin; this value controls when split selection stops and pruning begins.
Minimum child node size to stop Use this option to control the smallest permissible number in a child node, for a split to be applied. While the Minimum n of cases parameter determines whether an additional split is considered at any particular node, the Minimum n in child node parameter determines whether a split will be applied, depending on whether any of the two resultant child nodes will be smaller (have fewer cases) than n as specified via this option.
Fraction of objects Specifies the fraction of object for FACT-style direct stopping.
Maximum number of nodes Maximum number of nodes.
Maximum number of levels in tree Maximum number of levels in tree.
Number of surrogates Specifies the number of surrogates for surrogate splits.

V-Fold cross-validation

Element Name Description
V-fold cross-validation Performs V-fold cross-validation of the final tree (but not the full tree sequence; use the General C and RT options to compute v-fold cross-validation estimates for the entire tree-sequence); in V-fold cross-validation, random samples are generated from the learning sample to provide an estimate of the CV cost for each classification tree in the tree sequence. Note that in data mining applications with large data sets, V-fold cross-validation may require significant computing time.
Number of folds(sets) Specifies the number of folds (sets, random samples) for V-fold cross-validation.
Random number seed Specifies the random number seed for V-fold cross-validation (for generating the random samples).
Standard error rule Standard error rule for finding optimal trees via V-fold cross-validation; refer to the Electronic Manual for additional details.

Results

Element Name Description
Creates predicted classes Creates observational statistics, including predicted classifications.
Generates data source, if N for input less than Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name Description
Generates C/C++ code Generates C/C++ code for deployment of predictive model.
Generates SVB code Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
Saves C/C++ code Save C/C++ code for deployment of predictive model.
File name for C/C code Specify the name and location of the file where to save the (C/C++) deployment code information.
Saves SVB code Save Statistica Visual Basic code for deployment of predictive model.
File name for SVB code Specify the name and location of the file where to save the (SVB/VB) deployment code information.
Saves PMML code Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
File name for PMML (XML) code Specify the name and location of the file where to save the (PMML/XML) deployment code information.