Interactive Trees (C&RT, CHAID) Overview
The Statistica Interactive Trees (C&RT, CHAID) module builds ("grows") classification and regression trees as well as CHAID trees based on automatic (algorithmic) methods, user-defined rules and criteria specified via a highly interactive graphical user interface (brushing tools), or combinations of both. The purpose of the module is to provide a highly interactive environment for building classification or regression trees (via classic C&RT methods or CHAID) to enable users to try various predictors and split criteria in combination with almost all functionality for automatic tree building provided in the General Classification and Regression Trees (GC&RT) and General CHAID Models (GCHAID) modules of Statistica.
The Interactive Trees (C&RT, CHAID) module can be used to build trees for predicting continuous dependent variables (regression) and categorical dependent variables (classification). The program supports the classic C&RT algorithm popularized by Breiman et al. (Breiman, Friedman, Olshen, & Stone, 1984; see also Ripley, 1996) as well as the CHAID algorithm (Chi-square Automatic Interaction Detector; see Kass, 1980).
Unique Advantages of the Interactive Trees (C&RT, CHAID) Module
While much of the functionality of the Interactive Trees (C&RT, CHAID) module can be found in other tree-building procedures of Statistica and Statistica Data Miner, there are a number of unique aspects to this program:
- The program is particularly optimized for very large data sets, and in many cases the raw data do not have to be stored locally for the analyses.
- Because the Interactive Trees module does not support ANCOVA-like design matrices, it is more flexible in the handling of missing data; for example, in CHAID analyses, the program will handle predictors one at a time to determine a best (next) split; in the General CHAID (GCHAID) Models module, observations with missing data for any categorical predictor are eliminated from the analysis. See also, Missing Data in GC&RT, GCHAID, and Interactive Trees for additional details.
- You can perform "what-if" analyses by interactively deleting individual branches, and growing other branches, and observing various results statistics for the different trees (tree models).
- You can automatically grow some parts of the tree but manually specify splits for other branches or nodes. For example, if certain predictor variables can in practice not easily or economically be measured (e.g., information on personal Income is usually difficult to obtain in questionnaire surveys), then you can find and specify alternative predictors and splits for nodes to avoid such variables (e.g., replace Income with Number of rooms in primary residence).
- You can define specific splits. This is useful when you want to build simple and parsimonious solutions that can easily be communicated and implemented (e.g., a split on Income < 20,345 is less "convenient" then a split at Income < 20,000).
- You can quickly copy trees into new projects to explore alternative splits and methods for growing branches.
- You can save entire trees (projects) for later use. When you reload the tree projects, the tree will be restored to the exact state as when it was saved.
Methods for Building Trees for Regression and Classification
The Statistica system includes a very comprehensive selection of algorithms for building trees for regression and classification tasks.
The Statistica Interactive Trees (C&RT, CHAID) module provides a very flexible and easy to use environment to grow trees or portions (branches) of trees algorithmically (automatically) as well as manually. It adds an extremely powerful tool for interactive data analysis and model building that may supplement and augment the many other techniques available in Statistica Data Miner for automatically determining valid models for prediction and predictive classification.