Weight of Evidence (WoE) Technical Notes
The Automated WoE Coding module will generate the best default groupings for continuous and categorical (discrete) predictor variables of a binary dependent (outcome, Y) variable (such as credit default).
For continuous predictors, linear, quadratic, and cubic constraints on the WoE function over the coded intervals can be observed while identifying the best partitioning.
The algorithm will identify the best partitions based on the initial default partitions. The final solution is not necessarily a globally optimal solution, but rather a locally optimal solution given the default partitions. User options are provided to explore the partitions and to apply custom splits.
Weight of Evidence (WoE) and Information Value (IV)
- Formulas for the Weight of Evidence (WoE) values and Information Value (IV) are provided in the Glossary.
- Details and example computations are also given in the Introductory Overview.
Continuous Predictors
- By default, each predictor is submitted to the Classification and Regression Trees (C&RT) algorithm for a classification analysis against the binary dependent variable.
- The default partitions are then further examined and merged and/or split based on user-defined criteria pertaining to the difference in the WoE of adjacent intervals. Groups can be merged or merged groups can be split during this process.
- If the number of default groups is fewer than 20, an explicit exhaustive search is performed to evaluate all possible combinations of default and merged-from-default groups.
- Otherwise, a modified CHAID algorithm is used to perform the merging of adjacent intervals. This modification pertains to the goal function of the CHAID algorithm. Instead of the standard Chi-square statistic, a delta-WoE (difference in WoE values) between groups is used to drive the merging of groups and the splitting of merged groups.
Constraints
The final merging and splitting of merged groups is performed so that certain constraints on the WoE function of the resulting intervals are observed. Specifically, the explicit exhaustive or CHAID-based search will only retain:
- Monotone solutions. The WoE values of all adjacent recoded intervals will either increase (positive monotone relationship of predictor values to WoE), or the WoE values of all adjacent recoded intervals will always decrease (negative monotone relationship of predictor vales to WoE)
- Quadratic solutions. The WoE function across the coded intervals is allowed to have a single reversal, so that the resulting function is either U-shaped or inverse-U-shaped
- Cubic solutions. The WoE function across the coded intervals is allowed two reversals, so that the resulting function is S-shaped.
- Categorical (Discrete) Predictors, Interactions. The modified CHAID algorithm is used by default to identify final recoding solutions.