Random Forest Classification

Creates random forest classification trees for continuous and categorical predictors. Various observational statistics (predicted classifications, lift charts) can be requested as an option.

General

Element Name Description
Detail of computed results reported Selects the detail of the reported results; if Minimal detail is selected, the program will only report the summary statistics from the analysis; if Comprehensive detail is requested, the tree graph and tree structure will also be displayed; if All results are requested, various graphs and spreadsheets of predicted values (classifications) and statistics will also be reported, including lift charts for the learning (analysis) sample and testing sample.
Prior probabilities This option can be used to specify how likely it is, without using any prior knowledge of the values for the predictor variables in the model, that a case or object will fall into one of the classes. Select the Estimated option to specify that the likelihood that a case or object will fall into one of the classes is proportional to the dependent variable class sizes. Select the Equal option to specify that the likelihood that a case or object will fall into one of the classes is the same for all dependent variable classes.
Generates data source, if N for input less than Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Advanced

Element Name Description
Number of predictor Specify the number of predictors for the tree models. The default value is a subset of the total number of predictor variables (selected from the Variables option on the Quick tab of this dialog).
Number of trees Specify here the number of simple regression trees to be computed in successive forest building steps. On the Results dialog, you can later review intermediate solutions, i.e. for fewer or larger number of trees than initially requested (and computed).
Subsample proportion Specify here the subsample proportion to be used for drawing the bootstrap learning samples for consecutive steps. Bootstrap creates subsets by randomly sampling, with replacement, from cases of the original data set. See also the Introductory Overview for a description of the basic algorithm implemented in Statistica Random Forest.
Random test data proportion Specify here the proportion of randomly chosen observations that will serve as a test sample in the computations; this option is only applicable, if the Test sample option (see below) is set to Off.
Minimum number to stop One way to control splitting is to allow splitting to continue until all terminal nodes contain no more than a specified minimum number of cases or objects; this minimum number of cases in a terminal node can be specified via this option.
Minimum child node size to stop Use this option to control the smallest permissible number in a child node, for a split to be applied. While the Minimum n of cases parameter determines whether an additional split is considered at any particular node, the Minimum n in child node parameter determines whether a split will be applied, depending on whether any of the two resultant child nodes will be smaller (have fewer cases) than n as specified via this option.
Maximum number of levels The value entered here will be used for stopping on the basis on the number of levels in a tree. Each time a parent node is split, the total number of levels (depth of the tree as measured from the root node) is examined, and the splitting is stopped if this number exceeds the number specified in the Maximum n levels box.
Maximum number of nodes The value entered here will be used for stopping on the basis of the number of nodes in each tree. Each time a parent node is split, the total number of nodes in the tree is examined, and the splitting is stopped if this number exceeds the number specified in Maximum n nodes box. The default value 3 would cause each consecutive tree to consist of a single split (one root node, two child nodes).
Seed for random number generator Specify here a constant for seeding the random number generator, which is used to select the subsamples for consecutive trees.
User-defined final model Set this option to TRUE in order to select a particular model (with a particular number of consecutive trees) as the final model. By default (FALSE), the program will automatically select the model that generated the smallest cross-validation error in the test sample. If you set this option to FALSE, specify the desired (final) number of trees in Number of trees for model option below.
Number of trees for model This option is only applicable if the User-defined final model option is set to TRUE. In that case specify the desired (final) number of trees. By default (User-defined final model = FALSE), the program will automatically select the model that generated the smallest cross-validation error in the test sample.

Stop

Element Name Description
Enable Advance Stopping Condition Select this check box to enable early stopping of the Random Forest training algorithm, i.e. stop adding trees before the full number of trees are added to the model. Checking this option will make the rest of the controls on this Stop available.
Cycles to calculate mean error Specifies the number of cycles across which improvement is measured. The Random Forest algorithm may demonstrate noise on the training and test errors. It is therefore not usually a good idea to halt training on the basis of a failure to achieve the desired improvement in error rate over a single cycle. The window specifies a number of cycles over which the error rates are monitored for improvement. Training is only halted if the error fails to improve for that many cycles.
Percentage decrease in training error Specify a minimum percentage of improvement (drop) in error that must be made; if the rate of improvement drops below this level, training is terminated.

Results

Element Name Description
Start tree number The method of Random Forest (see the Introductory Overview) will generate a sequence of simple trees (the complexity of each tree can be specified on the Random Forest Specifications dialog - Advanced tab). If you want to review the actual individual trees (as Tree graphs or the Tree structure), select here the specific numbers of trees you want to review.
End tree number The method of Random Forest (see the Introductory Overview) will generate a sequence of simple trees (the complexity of each tree can be specified on the Random Forest Specifications dialog - Advanced tab). If you want to review the actual individual trees (as Tree graphs or the Tree structure), select here the specific numbers of trees you want to review.
Predictions for all samples Computes predicted classifications and other statistics for all observations (samples).

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name Description
Generates C/C++ code Generates C/C++ code for deployment of predictive model.
Generates SVB code Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
Saves C/C++ code Save C/C++ code for deployment of predictive model.
File name for C/C code Specify the name and location of the file where to save the (C/C++) deployment code information.
Saves SVB code Save Statistica Visual Basic code for deployment of predictive model.
File name for SVB code Specify the name and location of the file where to save the (SVB/VB) deployment code information.
Saves PMML code Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
File name for PMML (XML) code Specify the name and location of the file where to save the (PMML/XML) deployment code information.