Parameterization
Effects of categorical predictor variables can be coded in the design matrix using different types of Parameterization, specifically overparameterized, sigma-restricted, or reference coding.
Sigma-Restricted Model
In a sigma-restricted model, the categories can be assigned any values corresponding to group membership to facilitate interpretation of the regression coefficient associated with the single categorical predictor variable. The values on the resulting predictor variable represent a quantitative contrast between the categories.
For a categorical variable with k levels, Statistica® creates k-1 indicator variables with the reference level coded as -1. For example, if the categorical variable has 3 levels, say, X, Y, and Z, with Z being the reference level, then the indicator variable would be coded as follows:
| Column X | Column Y | |
|---|---|---|
| X | 1 | 0 |
| Y | 0 | 1 |
| Z | -1 | -1 |
The values used to represent group membership, that is, 1 and -1, sum to zero. This parameterization leads to the interpretation that each coefficient estimates the difference between each level and the average of the other 2 levels, that is, the coefficient for X is the estimate of the difference between level X and the average of levels of Y and Z.
Overparameterized Model
An overparameterized model uses the indicator variable approach to represent the effects for categorical predictor variables. In this method, a separate predictor variable is coded for each group identified by a categorical predictor variable.
For a categorical variable with k levels, Statistica® creates k indicator variables. For example, if the categorical variable has 3 levels, X, Y, and Z, then the indicator variables would be as follows:
| Column X | Column Y | Column Z | |
|---|---|---|---|
| X | 1 | 0 | 0 |
| Y | 0 | 1 | 0 |
| Z | 0 | 0 | 1 |
Reference Coding
The Ref option is selected to compute the design matrix for categorical predictors in the models using reference coding. For a categorical variable with k levels, Statistica® creates k-1 indicator variables with the reference level coded as 0. The parameter estimates of the reference-coded categorical predictor estimates the difference of effect between the specific level and the reference level. For example, for a categorical variable with 3 levels, X, Y and Z, with Z being the reference level, the indicator variable would be as follows:
| Column X | Column Y | |
|---|---|---|
| X | 1 | 0 |
| Y | 0 | 1 |
| Z | 0 | 0 |