Split Input Data into Training and Testing Samples (Classification)

This node will convert a single input descriptor not marked for deployment into two input descriptors: one marked for deployment and the other one marked not for deployment. This script will also clear our all previously existing information for trained methods from the global dictionary.

 The training and testing datasets will be used during comparative and competitive training, as testing datasets and training datasets, respectively.

 Please note: If for the current input data source an indicator variable for learning and testing samples was specified, then that indicator variable will be used to determine the two samples, and the specifications for random selection via this node will be ignored.

General

Element Name Description
Percent of case numbers Determines whether to use the percentage value or the approximate number of cases to select the testing sample.

 If a learning/testing variable and codes are selected for the input data source, then those selections will be used to generate the two samples (instead of random sampling!).
Approximate percent of cases for testing The percentage of cases to be selected randomly into the testing sample; only applicable if Percent of case numbers is True.

 If a learning/testing variable and codes are selected for the input data source, then those selections will be used to generate the two samples (instead of random sampling!).
Approximate number of cases for testing The approximate number of cases be selected randomly into the testing sample; only applicable if Percent of case numbers is False.

 Note that this parameter will be translated into a percentage value, relative to the total number of observations in the input data; hence, if case selection conditions are in effect or if there are many missing data values, the resulting sample sizes may only be approximated.

 If a learning/testing variable and codes are selected for the input data source, then those selections will be used to generate the two samples (instead of random sampling!).

Clear Trained Models

Element Name Description
Clear previous training Clear information for other (previously) trained models from the current global dictionary. This is necessary when you remove methods from a complex project with multiple competitive or cooperative methods. Be sure to leave the default for this option as TRUE in this case, and train the entire project whenever you remove a single method (e.g., when removing a Regression Tree node from a larger project).