WoE Settings

  1. Click the Settings button in the Weight of Evidence (WoE) Startup Panel to display the Settings dialog box, which contains options to set the parameters to control the details of the algorithms used to identify the best grouping for the selected predictor candidates. The Introductory Overview also provides an overview of the approaches used in the automated WoE coding module.
  2. To summarize, by default, Statistica will submit each continuous predictor to the Classification and Regression Trees (C&RT) algorithm to determine an initial best partitioning (grouping) of the values into homogeneous subgroups (terminal tree nodes or bins). The minimum number of bins that will be created can be controlled with the Minimum number of C&RT bins parameter.
    • Given the initial grouping, the variables  may be submitted to a CHAID algorithm, which is modified to use the Maximum and Minimum WoE values, as well as Minimum N and Bad N criteria to identify groups that can be combined.
    • On the other hand, if the number of default bins (from the C&RT analysis) is fewer than 20, an exhaustive search is performed on all partitions or adjacent partitions for continuous predictors.
  3. In all cases, the final merging of default groups derives groupings consistent with the different constraints for continuous predictors (see also the WoE Introductory Overview).

C&RT parameters

Use C&RT preprocessor

Select this check box to use the Classification and Regression Trees (C&RT) algorithm as a preprocessor to determine the best initial grouping of value ranges (default partitioning). If this check box is not selected, the program will, by default, create 20 bins with approximately equal numbers of observations as the default.

NOTE: The C&RT algorithm implemented in Statistica is multithreaded and very efficient. We recommend that this default not be reset.

Minimum number of C&RT bins

Enter a value to set the minimum number of bins to be created.

OK

Click the button to apply settings and exit the Settings dialog box.

Cancel

Click this button to exit the Settings dialog box without applying any changes to settings.

Reset selections

Click this button to reset the selections on the Settings dialog box back to the original defaults.

Set as default

Click this button to set the options on the Settings dialog box as defaults for future analyses.

CHAID parameters

The CHAID algorithm will:

  • Combine or merge (default) partitions or groups if two groups show a difference in WoE values less than the Maximum WoE merge value.
  • Split groups if two or more groups can be created with a difference in WoE values greater than the Minimum WoE split value, provided the resulting groups have at least WoE minimum N number of cases and WoE minimum Bad N.

    The CHAID algorithm will not be used if the Use exhaustive search for continuous variables check box (see description below) is selected, and the default number of partitions (groups) is less than or equal to 20.

Maximum WoE merge value

Set the Maximum WoE value difference between two groups that will be merged.

Minimum WoE split value

Set the Minimum WoE value difference between two or more groups that will be created by splitting a previously merged group.

WoE minimum Bad N

Set the minimum number of Bad cases (observations) in a group resulting from splitting; if the result in a group will be fewer than this number, the respective split is not applied.

WoE minimum N

Set the minimum number of cases (observations) in a group resulting from splitting; if the result in a group will be fewer than this number, the respective split is not applied.

Use exhaustive search for continuous variables

Select this check box to perform an exhaustive search over all default partitions in order to identify partitions that are consistent with the constraints (see also the Introductory Overview). This option will be ignored if the number of default partitions exceeds 20.

Log odds plot

Use the options within the Log Odds plot group box to specify how to generate log-odds plots for a given predictor variable.

Logodds plot of raw variable

Select this checkbox to only generate the Log-Odds plot for continuous predictor variables.  The mean of the predictor variable is plotted against the natural log of the ratio of the number of Bads to the number of Goods across all bins.   

Construct bins based on above predictor settings

Select this option to create the log-odds plot based upon the default custom group generated by the analysis.  

Use custom C&RT preprocessor to construct bins

Select this option to generate the bins for the Log-Odds plot based on the C&RT algorithm, using the custom minimum and maximum number of C&RT bins as specified below.   The log-odds plot is based on the bin configuration that generates the largest information value that also satisfies the user specified minimum and maximum bin constraints.  

Minimum number of C&RT bins:  Specify the minimum number of C&RT bins for the custom C&RT preprocessor.

Maximum number of C&RT bins:  Specify the maximum number of C&RT bins for the custom C&RT preprocessor.

Minimum Bad N per level

Specify the minimum number of Bads that should be contained within each bin for the Log-Odds plot.

Maximum N per level

Specify the minimum number of observations that should be contained within each bin for the Log-Odds plot.

Create custom groups by default

Select this checkbox to display a custom group by default.  If this option is not selected, then the custom group is not created until one is explicitly created.