Combining Groups Classification with Deployment

The program will automatically find and implement (e.g., for data marked Data for deployment) a best recoding scheme for the prediction of a categorical variable from one or more categorical predictors with many classes (e.g., such as SIC codes with over 10,000 distinct values). The program uses an efficient CHAID-like algorithm to determine the best combinations of classes that will yield a strong relationship to the respective outcome variable of interest. The recoded (aggregated) class variables (now with fewer distinct values) can then be submitted to subsequent analyses with the various tools for predictive data mining.

General

Element Name Description
Min-N to stop (% of cases) The minimum number of cases (observations) per recoded class (node), expressed as a percent of the total number of observations (if the specified percentage of cases evaluates to less than 5 observations, the minimum number of cases per recoded class (node) will be set to 5).
Minimum number of categories Minimum number of categories to recode.
p value for splitting p value used for splitting.
p value for merging p value used for merging.
Splitting after merging Splitting after merging of categories.
Bonferroni adjustment Applies Bonferroni adjustment to probabilities.
Add new variables Add new variables to the input spreadsheet to hold the recoded variables.
Generates data source Generates a data source for further analyses with other Data Miner nodes.