C&RT Quick Specs - Advanced Tab
Select the Advanced tab of the C&RT Quick specs dialog box to access advanced estimation options: Number of surrogates and Sigma-restricted parameterization.
- Number of surrogates
- By choosing "similar" predictors (surrogates) with valid data, cases (observations) with missing data can be classified, so that such cases can be included in the analysis. In fact, cases with missing values in the response are treated as "prediction samples" and cases with missing values in the predictor as "surrogate samples." The Number of surrogates edit field controls the number of surrogates that can be chosen by the analysis during the tree-building process. By default, the number of surrogates is 0 (zero), and missing data values are excluded from the analysis.
In general, at every step during the tree building process, STATISTICA will identify a variable for the next split to improve the accuracy of prediction. If for a particular observation (case) the value for the chosen variable is missing, then the program will look to the next-best variable to split on, to act as a "surrogate" for the best variable. If the value for that variable is missing as well, then the program will look to the third-best split variable, and so on. The Number of surrogates option determines how far down the list of predictors (sorted by the degree of improvement in the accuracy of prediction provided by each respective split candidate) the program will go when attempting to find a surrogate for a variable that has missing data for a particular case.
Note: Missing data (and surrogate splits). Missing data in predictor variables and surrogate split variables are handled differently in the General Classification and Regression Trees (GC&RT) module as compared to the Interactive Trees module. Because the Interactive Trees module does not support ANCOVA-like design matrices, it is more flexible in the handling of missing data. Specifically, in GC&RT, observations classified or predicted via surrogate split variables are not included in subsequent tree-building itself (because it would be ambiguous how to construct a unique ANCOVA-like design matrix to include surrogate split variables); to consider variables (and the missing data for those variables) one-by-one, and to include observations classified or predicted via surrogate splits in the tree building process itself, use the Interactive Trees module instead. Refer also to Missing Data in GC&RT, GCHAID, and Interactive Trees for additional details.
- Sigma-restricted parameterization
- This check box is available only if you chose C&RT with coded designs from the Startup Panel. In that case, you can specify ANOVA/ANCOVA-like designs in the same manner as discussed in detail in the context of the ANOVA and General Linear Models modules. This option determines the coding that is used for the categorical predictor effects and their interactions. When the Sigma-restricted parameterization check box is selected, the sigma-restricted parameterization is used. If this check box is not selected, then an over-parameterized design matrix is constructed. Refer also to Sigma Restricted and Overparameterized Model in the context of the General Linear Models module for additional details concerning the different ways in which the categorical effects can be coded into vectors of a design matrix.