Basic Tree-Building Algorithm: CHAID and Exhaustive CHAID
The acronym CHAID stands for Chi-squared Automatic Interaction Detector. This name derives from the basic algorithm that is used to construct (non-binary) trees, which for classification problems (when the dependent variable is categorical in nature), relies on the Chi-square test to determine the best next split at each step; for regression-type problems (continuous dependent variable) the program computes F-tests. Specifically, the algorithm proceeds as follows:
This process continues until no further splits can be performed (given the alpha-to-merge and alpha-to-split values).
Note: Missing data. Missing data in predictor variables are handled differently in the General CHAID (GCHAID) Models and General Classification and Regression Trees (GC&RT) modules, as compared to the Interactive Trees module. Because the Interactive Trees module does not support ANCOVA-like design matrices, it is more flexible in the handling of missing data. Refer to Missing Data in GC&RT, GCHAID, and Interactive Trees for additional details.
CHAID and Exhaustive CHAID Algorithms.
Exhaustive CHAID, a modification to the basic CHAID algorithm, performs a more thorough merging and testing of predictor variables, and hence requires more computing time. Specifically, the merging of categories continues (without reference to any alpha-to-merge value) until only two categories remain for each predictor. The program then proceeds as described above in the Selecting the split variable step, and selects among the predictors the one that yields the most significant split. For large data sets, and with many continuous predictor variables, this modification of the simpler CHAID algorithm may require significant computing time.