Advanced Regression Trees (C And RT)

Creates interactive regression trees (C&RT) for continuous and categorical predictors; builds an optimal tree structure to predict continuous dependent variables via V-fold cross-validation (optional). Various observational statistics (predicted values) can be requested as an option.

 NOTE: Unlike the options in General C&RT, these options are optimized to work for large and very large data files, and with predictor variables that can potentially be sparse. Specifically, these options will use the computational algorithms of the Statistica Interactive Trees module, to evaluate predictors one-by-one regardless of missing data patterns in other predictors.

General

Element Name Description
Detail of computed results reported Specifies the detail of computed results reported. If Minimal results is requested, only the final tree will be displayed; if Comprehensive detail is requested, various other statistical summaries are reported as well; if All results is requested, various node statistics and graphs are reported as well. Note that observational statistics (predicted values) are available as an option.
Stopping option for pruning One of the ways by which the size of the tree can be checked, is by pruning the tree, i.e., by removing parts of trees with the aim of computing the right-sized tree. If the dependent variable is continuous (regression), then the measure used is the variance of cases in a node. Select the Prune on variance option button to prune on the basis of variance. Another way to prune in the case of a categorical variable (classification) is on the basis of fraction of objects of one or more classes in the node. In the GC&RT module, this is done by the FACT-style direct stopping; splitting on the predictor variables continues until all the terminal nodes in the classification tree are pure. If FACT-style direct stopping is selected as the stopping rule, then the value in the Fraction of objects box is used to control the classification tree selected as the right-sized tree.
Minimum n per node If a pruning method is selected in the Stopping rule group box, i.e., Prune on misclassification error or Prune on deviance, enter a value in the Minimum n box to control when split selection stops and pruning begins. Unless splitting is terminated by one of the other criteria specified on this tab, the tree-building process will continue until no more splits can be applied without creating nodes with fewer cases than specified in this edit field.
Minimum child node size to stop Use this option to control the smallest permissible number in a child node, for a split to be applied. While the Minimum n of cases parameter determines whether an additional split is considered at any particular node, the Minimum n in child node parameter determines whether a split will be applied, depending on whether any of the two resultant child nodes will be smaller (have fewer cases) than n as specified via this option.
Fraction of objects If FACT-style direct stopping is selected as the Stopping rule (see above), the value in the Fraction of objects box is used to control the classification tree selected as the right-sized tree.
Maximum number of nodes The value supplied in the Maximum n nodes box will be used for stopping on the basis of the number of nodes in the classification tree. Each time a parent node is split, the total number of nodes in the tree is examined, and the splitting is stopped if this number exceeds the number specified in Maximum n nodes box.
Maximum number of levels in tree Maximum number of levels in tree.
Number of surrogates Specifies the number of surrogates for surrogate splits.
Creates predicted classes Creates observational statistics (predicted values).
Generates data source, if N for input less than Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

V-Fold cross-validation

Element Name Description
V-fold cross-validation Performs V-fold cross-validation (but not the full tree sequence; use the General C and RT options to compute v-fold cross-validation estimates for the entire tree-sequence); note that in data mining applications with large data sets, V-fold cross-validation may require significant computing time.
Number of folds(sets) Specifies the number of folds (sets, random samples) for V-fold cross-validation.
Random number seed Specifies the random number seed for V-fold cross-validation (for generating the random samples).
Standard error rule Standard error rule for finding optimal trees via V-fold cross-validation; refer to the Electronic Manual for additional details.

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name Description
Generates C/C++ code Generates C/C++ code for deployment of predictive model.
Generates SVB code Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
Saves C/C++ code Save C/C++ code for deployment of predictive model.
File name for C/C code Specify the name and location of the file where to save the (C/C++) deployment code information.
Saves SVB code Save Statistica Visual Basic code for deployment of predictive model.
File name for SVB code Specify the name and location of the file where to save the (SVB/VB) deployment code information.
Saves PMML code Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
File name for PMML (XML) code Specify the name and location of the file where to save the (PMML/XML) deployment code information.