General Classification Trees (C&RT)

Creates standard (C&RT) classification trees with ANOVA-like coded designs for continuous and categorical predictors; builds an optimal tree structure to predict categorical dependent variables via V-fold cross-validation (optional). Various observational statistics (predicted classifications) can be requested as an option.

General

Element Name	Description
Detail of computed results reported	Detail of computed results; if Minimal results is requested, then only the final tree will be displayed; if Comprehensive detail is requested, then various other statistical summaries are reported as well; if All results is requested, then various node statistics and graphs are reported. Note that observational statistics (predicted classifications) are available as an option.
Analysis syntax	Analysis syntax string for general classification trees. You can specify here the complete syntax, as, for example, copied from a Statistica analysis. Set this string to empty, or just TREES; to create the syntax from the specific options specified below.
Design	Specifies the design for the between group (ANCOVA-like) design (categorical and continuous predictors); by default (if no design is specified) a full factorial design will be constructed for categorical predictors, and continuous predictor main effects are evaluated.   Use the syntax:  DESIGN = Design specifications   Example 1.  DESIGN = GROUP \| GENDER \| TIME \| PAID; {makes a full factorial design}   Example 2.  DESIGN = SEQUENCE + PERSON(SEQUENCE) + TREATMNT + SEQUENCETREATMNT;   Example 3.  DESIGN = MULLET \| SHEEPSHD \| CROAKER @2; {Makes factorial design to degree 2}   Example 4.  DESIGN = TEMPERAT \| MULLET \| SHEEPSHD \| CROAKER - TEMPERAT; {Removes main effect for TEMPERAT from factorial design}   Example 5.  DESIGN = BLOCK + DEGREES + DEGREESDEGREES + TIME + TIMETIME + TIMEDEGREES;
Parameterization	Specifies the parameterization of the ANCOVA-like between group design for the categorical predictors; specify No parameterization to perform standard C&RT analyses; for additional details, see the Electronic Manual topic The Sigma-Restricted vs. Overparameterized Model.
Maximum number of nodes	Specifies the maximum number of nodes.
Number of surrogates	Specifies the number of surrogates for surrogate splits.
Goodness of fit measure	Specifies the goodness-of-fit measure for classification; the C&RT-style exhaustive search for univariate splits works by searching for the split that maximizes the reduction in the value of the selected goodness of fit measure. When the fit is perfect, classification is perfect.
Prior class probabilities	Specifies how to determine the a-priori classification probabilities.
Creates predicted classes	Creates observational statistics, including predicted classifications.
Generates data source, if N for input less than	Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Pruning

Element Name	Description
Stopping option for pruning	Specifies the stopping rule for the pruning computations.
Minimum n per node	Specifies a minimum n-per-node, when pruning should begin; this value controls when split selection stops and pruning begins.
Fraction of objects	Fraction of object for FACT-style direct stopping.

V-Fold Cross-Validation

Element Name	Description
V-Fold Cross-Validation	Performs V-fold cross-validation; in V-fold cross-validation, random samples are generated from the learning sample to provide an estimate of the CV cost for each classification tree in the tree sequence. Note that in data mining applications with large data sets, V-fold cross-validation may require significant computing time.
Number of folds(sets)	Number of folds (sets, random samples) for V-fold cross-validation.
Random number seed	Random number seed for V-fold cross-validation (for generating the random samples).
Standard error rule	Standard error rule for finding optimal trees via V-fold cross-validation; refer to the Electronic Manual for additional details.

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name	Description
Generates C/C++ code	Generates C/C++ code for deployment of predictive model.
Generates SVB code	Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code	Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
Saves C/C++ code	Save C/C++ code for deployment of predictive model.
File name for C/C code	Specify the name and location of the file where to save the (C/C++) deployment code information.
Saves SVB code	Save Statistica Visual Basic code for deployment of predictive model.
File name for SVB code	Specify the name and location of the file where to save the (SVB/VB) deployment code information.
Saves PMML code	Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
File name for PMML (XML) code	Specify the name and location of the file where to save the (PMML/XML) deployment code information.

Contents

Index

Search Results

General Classification Trees (C&RT)

General

Pruning

V-Fold Cross-Validation

Deployment