Boosting Classification Trees with Deployment

Creates boosting classification trees with deployment for continuous and categorical predictors. Various observational statistics (predicted classifications, lift charts) can be requested as an option.

General

Element Name Description
Detail of computed results reported. Selects the detail of the reported results; if Minimal detail is selected, the program will only report the summary statistics from the analysis; if Comprehensive detail is requested, the tree graph and tree structure will also be displayed; if All results are requested, various graphs and spreadsheets of predicted values (classifications) and statistics will also be reported, including lift charts for the learning (analysis) sample and testing sample.
Prior probabilities This option can be used to specify how likely it is, without using any prior knowledge of the values for the predictor variables in the model, that a case or object will fall into one of the classes. Select the Estimated option to specify that the likelihood that a case or object will fall into one of the classes is proportional to the dependent variable class sizes. Select the Equal option to specify that the likelihood that a case or object will fall into one of the classes is the same for all dependent variable classes.
Generates data source, if N for input less than Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Advanced

Element Name Description
Learning rate Specify the learning or shrinkage rate for the computations. The Statistica Boosting Trees module will compute a weighted additive expansion of simple regression trees. The specific weight with which consecutive simple trees are added into the prediction equation is usually a constant, and referred to as the learning rate or shrinkage parameter; empirical studies have shown that shrinkage values of .1 or less usually lead to better models (with better predictive validity).
Number of additive trees Specify the number of additive terms to be computed, i.e., the number of simple regression trees to be computed in successive boosting steps.
Subsample proportion Specify the subsample proportion to be used for drawing the random learning sample for consecutive boosting steps.
Random test data proportion Specify here the proportion of randomly chosen observations that will serve as a test sample in the computations; this option is only applicable, if the Test sample option is set to Off.
Minimum number to stop One way to control splitting is to allow splitting to continue until all terminal nodes contain no more than a specified minimum number of cases or objects; this minimum number of cases in a terminal node can be specified via this option.
Minimum child node size to stop Use this option to control the smallest permissible number in a child node, for a split to be applied. While the Minimum n of cases parameter determines whether an additional split is considered at any particular node, the Minimum n in child node parameter determines whether a split will be applied, depending on whether any of the two resultant child nodes will be smaller (have fewer cases) than n as specified via this option.
Maximum number of levels The value entered here will be used for stopping on the basis on the number of levels in a tree. Each time a parent node is split, the total number of levels (depth of the tree as measured from the root node) is examined, and the splitting is stopped if this number exceeds the number specified in the Maximum n levels box.
Maximum number of nodes The value entered here will be used for stopping on the basis of the number of nodes in each tree. Each time a parent node is split, the total number of nodes in the tree is examined, and the splitting is stopped if this number exceeds the number specified in Maximum n nodes box. The default value 3 would cause each consecutive tree to consist of a single split (one root node, two child nodes).
Seed for random number generator Specify a constant for seeding the random number generator, which is used to select the subsamples for consecutive boosting trees.
User-defined final model Set this option to TRUE in order to select a particular model (with a particular number of consecutive trees) as the final model. By default (FALSE), the program will automatically select the model that generated the smallest cross-validation error in the test sample. If you set this option to FALSE, specify the desired (final) number of trees in Number of trees for model option below.
Number of trees for model This option is only applicable if the User-defined final model option is set to TRUE. In that case specify the desired (final) number of trees. By default (User-defined final model = FALSE), the program will automatically select the model that generated the smallest cross-validation error in the test sample.

Results

Element Name Description
Start tree number The method of stochastic gradient boosting trees will generate a sequence of simple trees (the complexity of each tree can be specified). If you want to review the actual individual trees (as Tree graphs or the Tree structure), use the Start/End tree number to select the specific numbers of trees you want to review. Note that separate results are produced for each dependent variable class (category).
End tree number The method of stochastic gradient boosting trees will generate a sequence of simple trees (the complexity of each tree can be specified). If you want to review the actual individual trees (as Tree graphs or the Tree structure), use the Start/End tree number to select the specific numbers of trees you want to review. Note that separate results are produced for each dependent variable class (category).
Predictions for all samples Computes predicted classifications and other statistics for all observations (samples).

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name Description
Generates C/C++ code Generates C/C++ code for deployment of predictive model.
Generates SVB code Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large datasets.