Workspace Node: Advanced Regression CHAID - Specifications - Validation Tab
In the Advanced Regression CHAID workspace node dialog box, under the Specifications heading, select the Validation tab to access the following options. As described in the Introductory Overview (the V-Fold Cross-Validation of Trees and Tree Sequences section), the v-fold cross-validation options available will be applied to the best automatically selected tree (solution) only, and not to the entire sequence of trees (built algorithmically). V-fold cross-validation is a very powerful technique for choosing a best tree from an automatically generated sequence of trees in classification and regression trees
Element Name | Description |
---|---|
V-fold cross-validation | V-fold cross-validation is particularly useful when no test sample is available and the learning sample is too small to have the test sample taken from it. Select the V-fold cross-validation check box to make use of v-fold cross-validation. Additional specifications for v-fold cross-validation include Seed for random number generator and V-fold cross-validation; v-value. These values are used to control the sampling that Statistica performs to obtain cross-validation error estimates. If this check box is selected when you click the OK button, the program will automatically grow the ("best") tree, and then compute risk estimates separately for the training and cross-validation samples when you specify to review Risk estimates (Results - Quick tab). |
Seed for random number generator | The positive integer value entered in this box is used as the seed for a random number generator that produces v-fold random subsamples from the learning sample to test the predictive accuracy of the computed trees. |
V-fold cross-validation; v-value | The value entered in this box determines the number of cross-validation samples that will be generated from the learning sample to provide an estimate for the current tree. See also the Introductory Overview for details. |
Test sample | The test sample option enables you to use a subsample of cases for estimating the accuracy of the classifier or prediction. Click the
Test sample button to display the
Test-Sample dialog box, through which you can switch on or off the
Test sample option as well as select a variable that will be used as the sample identifier variable. Click the
Sample Identifier Variable button to display a variable selection dialog box to choose the sample identifier variable. In addition, you need to select the code for the selected variable that uniquely identifies the cases to be used in the test sample. By default, when a sample identifier variable has been selected, a valid code will be displayed in the
Code for analysis sample box. If this is not the desired code for identifying the test sample, double-click on the box (or press the F2 key on your keyboard) to display a dialog box from which you can select the desired code from the list of valid variable codes.
If a Test sample is identified, the Risk estimates for the final tree (see the Results - Quick tab) and predicted values or classifications (and residuals; see the Results - Prediction tab) can be computed separately for the training and the testing sample. Options / C / W. See Common Options. |
OK | Click this button to accept all the specifications made in the dialog box and to close it. The analysis results are placed in the Reporting Documents workspace node after running (updating) the project. |