Advanced Classification CHAID
Full implementation of CHAID and exhaustive CHAID algorithms (General Chi-square Automatic Interaction Detector) for classification using continuous and categorical predictors; builds an optimal tree structure to predict categorical dependent variables via V-fold cross-validation (optional). Various observational statistics (predicted classifications) can be requested as an option. NOTE: Unlike the options in General CHAID, these options are optimized to work for large and very large data files, and with predictor variables that can potentially be sparse. Specifically, these options will use the computational algorithms of the Statistica Interactive Trees module, to evaluate predictors one-by-one regardless of missing data patterns in other predictors.
General
Element Name | Description |
---|---|
Detail of computed results reported | Specifies the detail of computed results reported. If Minimal results is requested, only the final tree will be displayed; if Comprehensive detail is requested, various other statistical summaries are reported as well; if All results is requested, various node statistics and graphs are also created. Note that observational statistics (predicted classifications) are available as an option. |
Exhaustive Search | Performs exhaustive search for best split, for Exhaustive CHAID. |
Minimum n per node | Minimum number of observations per node. |
Maximum number of nodes | Maximum number of nodes. |
Maximum number of levels in tree | Maximum number of levels in tree. |
p value for splitting | p value used for splitting. |
p value for merging | p value used for merging. |
Bonferroni adjustment | Applies Bonferroni adjustment to probabilities. |
Splitting after merging | Splitting after merging of categories. |
Automatic predictor intervals | Automatically updates predictor intervals at each node, to achieve optimum categorization of the range of continuous predictor variables. For very large data sets, deselect this option (set it to FALSE), because determining the best automatic split at each node requires additional passes through the data (at each node). |
V-Fold Cross-Validation
Element Name | Description |
---|---|
V-fold cross-validation | Performs V-fold cross-validation; in V-fold cross-validation random samples are generated from the learning sample; note that in data mining applications with large data sets, V-fold cross-validation may require significant computing time. |
Number of folds (sets) | Number of folds (sets, random samples) for V-fold cross-validation. |
Random number seed | Random number seed for V-fold cross-validation (for generating the random samples). |
Results
Element Name | Description |
---|---|
Observational statistics | Compute observational statistics (predicted classifications and classification statistics). |
Display terminal nodes | Display terminal node statistics. |
Average profit value | Average profit value, for computing gain statistics for terminal nodes |
Generates data source, if N for input less than | Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations. |
Deployment
Deployment is available if the Statistica installation is licensed for this feature.
Element Name | Description |
---|---|
Generates C/C++ code | Generates C/C++ code for deployment of predictive model. |
Generates SVB code | Generates Statistica Visual Basic code for deployment of predictive model. |
Generates PMML code | Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets. |
Saves C/C++ code | Save C/C++ code for deployment of predictive model. |
File name for C/C code | Specify the name and location of the file where to save the (C/C++) deployment code information. |
Saves SVB code | Save Statistica Visual Basic code for deployment of predictive model. |
File name for SVB code | Specify the name and location of the file where to save the (SVB/VB) deployment code information. |
Saves PMML code | Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets. |
File name for PMML (XML) code | Specify the name and location of the file where to save the (PMML/XML) deployment code information. |