Optimal Coding of Predictors
Specifically, the goal of the algorithms implemented in the automated WoE module is to identify the best groupings for predictor variables that results in the greatest differences in WoE between groups.
For continuous variables the automated WoE module identifies the best recoding to weight-of-evidence values. For categorical predictors or interactions between coded predictors, users can combine groups with similar observed WoE to create new coded predictors with continuous weight-of-evidence value.
Continuous Variables
For continuous predictors, first a default coding is derived using the Classification and Regression Trees (C&RT) algorithm. For default categories with fewer than 20 groups Statistica explicitly searches through all possible combinations of default groups to achieve the least numbers of groups with the greatest Information Value (IV). When the number of groups is greater than 20, Statistica uses the CHAID approach. The CHAID approach is a modification to the CHAID algorithm where instead of the customary X2 criterion, the change in WoE is used as the criterion.
Three types of constrained WoE recoding solutions are provided subject to their existence:
- Monotone solutions, where the WoE values of all adjacent recoded groups (intervals) either increase (positive monotone relationship of predictor intervals to WoE), or the WoE values of all adjacent recoded groups always decrease (negative monotone relationship of predictor intervals to WoE).
- Quadratic solutions, where the relationship between the coded value ranges (intervals) to WoE can have a single reversal so that the resulting function is either U-shaped or inverse-U-shaped.
- Cubic solutions, where the relationship between the coded value ranges (intervals) to WoE values can have two reversals so that the resulting function is S-shaped.
Two types of unconstrained WoE recoding solutions are provided:
- Custom coding is based on the default binning scheme with either C&RT or 10 equal groups of approximately equal size.
- The no restrictions coding is based on the custom solution after running either the exhaustive search or the CHAID algorithm.
Note that the initial bins might be adjusted prior to the algorithm in order to make sure that each bin satisfies the minimum N and minimum Bad N user specified parameters.
Categorical Variables
For categorical (discrete) predictors, the default (original) grouping is further refined using the modified CHAID approach.
Two types of unconstrained WoE recoding solutions are provided:
- Custom coding is based on the default binning of the group.
- The no restrictions coding is based on the default categorization provided by the modified CHAID algorithm.
Note that the initial bins might be adjusted prior to the algorithm in order to make sure that each bin satisfies the minimum N and minimum Bad N user specified parameters.
Interactions
For pairs of coded predictors the modified CHAID approach is implemented using interaction coding of the two-way interaction table or user-defined coding.