General CHAID Overview

The acronym CHAID stands for Chi-squared Automatic Interaction Detector. It is one of the oldest tree classification methods originally proposed by Kass (1980; according to Ripley, 1996, the CHAID algorithm is a descendent of THAID developed by Morgan and Messenger, 1973). Unlike the algorithms implemented in the General Classification and Regression Trees (GC & RT) module of STATISTICA, CHAID will build non-binary trees (i.e., trees where more than two branches can attach to a single root or node), based on a relatively simple algorithm that is particularly well suited for the analysis of larger data sets. Also, because the CHAID algorithm will often effectively yield many multi-way frequency tables (e.g., when classifying a categorical response variable with many categories, based on categorical predictors with many classes), it has been particularly popular in marketing research, in the context of market segmentation studies.

CHAID is one of three different types of tree-building algorithms available in STATISTICA; other available methods include C & RT algorithms (see General Classification and Regression Trees; see also Breiman, et al., 1984) and QUEST (Quick, Unbiased, Efficient Statistical Trees; see Classification Trees Analysis; see also Loh and Shih, 1997).

In particular, many of the issues discussed in the GC & RT Introductory Overview are applicable to GCHAID as well: Both techniques will construct trees, where each (non-terminal) node identifies a split condition, to yield optimum prediction (of continuous dependent or response variables) or classification (for categorical dependent or response variables). Hence, both types of algorithms can be applied to analyze regression-type problems or classification-type problems (see Classification and Regression Problems in the GC & RT Introductory Overview; QUEST is only applicable to classification-type problems).

See also, Basic Tree-Building Algorithm: CHAID and Exhaustive CHAID and General Computation Issues and Unique Solutions of STATISTICA GCHAID.

Note: there are four different types of tree-building algorithms available in STATISTICA: CHAID (Kass, 1980; see General CHAID Introductory Overview),  C & RT (Breiman, Friedman, Olshen, and Stone, 1984; see General Classification and Regression Trees), QUEST (Loh and Shih, 1997; see Classification Trees Analysis), and Interactive C&RT and CHAID Trees; see also CHAID, C&RT, and QUEST for additional details. For additional discussions of differences between different computational algorithms, see also the Interactive Trees (C & RT, CHAID) Introductory Overview and Missing Data in GC & RT, GCHAID, and Interactive Trees.

Note: Missing data. Missing data in predictor variables are handled differently in the CHAID module, as compared to  the Interactive Trees module. Because the Interactive Trees module does not support ANCOVA-like design matrices, it is more flexible in the handling of missing data. Specifically, in CHAID, observations with missing data in any predictor variable are excluded from the tree-building process itself (even if surrogates are requested; these surrogates are only used to compute predicted values or classifications); in Interactive Trees, variables (and missing data for those variables) can be considered one-by-one, so observations with missing data in the predictors are only excluded from the tree building process, if those variables are chosen for splits and no suitable surrogate was requested or selected. Refer also to Missing Data in GC&RT, GCHAID, and Interactive Trees for additional details.