Weight of Evidence (WoE)

Navigation Summary

  1. Open a data file.
  2. Select the Data Mining tab on the ribbon.
  3. In the Tools group, click Weight of Evidence to display the Weight of Evidence (WoE) dialog box.

Using WoE

  1. Select the Variables button in the Specifications and Results panel., and use the options to select the predictor and the binary dependent (Y variable). These are the default vs. non-default variables that will be recoded  by the WoE calculations.
  2. Select the Good and Bad codes in the  Dependent variable group box.
  3. Click the Compute Groups button in the Control Panel to begin the automatic predictor coding that groups or bins these variables.
  4. Display and zoom active graphs in the dialog box by double-clicking on the individual  graphs. When enlarged,you can use the slider at the bottom to enlarge each graph even more.

Statistica will process each predictor sequentially, one by one, using the algorithms described in the Introductory Overview. The default computations for determining the best predictor coding are multithreaded and will complete faster on multicore computing platforms.

To review the coding for each variable

  1. Select a specific constraint (such as a linear constraint for a continuous predictor in order to achieve better parsimony).
  2. Explore custom partitions, define interactions, and so on.
  3. Deploy the  final solution to Statistica Enterprise as a Rule or save elsewhere.

What happens as a result of applying the final coding rules:

  • New variables will be created following the convention WoE_OriginalName (each recoded predictor variable will be named with the prefix WoE, followed by an underscore and the original variable name).
  • The original value interval boundaries for each recoded group will be attached to the respective WoE value.

Note: You can add or delete predictor variables after running the analysis and producing binned predictor variables. For a complete description of this process, see Add/Delete Variables after Computing Groups.

Control Panel

The Control Panel is located on the left side of the Weight of Evidence (WoE) dialog box.

Open project

Click this button to open a previously saved project.

The file will not contain the active analyses with all the intermediate results. Therefore, after loading a previously saved project, if any changes are made to the solutions (such as when a custom split is introduced), all results need to be recomputed.

Save Project

Click this button to save all current selections as a project. The file will be saved as an .xml file, containing all current settings and results. However, the file will not contain the active analyses with all the intermediate results.

Settings

Click this button to display the Settings dialog box, which contains options to control the parameters of the algorithms used to determine the default coding. Specifically, you can change the default criteria to be used by the CHAID algorithm and general search algorithms, which are used to derive the best (constrained) default solutions.

Compute groups

Click the Compute groups button to begin the computations for determining the default best coding solutions for all predictors. Only selected variable will be computed. Skipped variables will be left as is in its dirty state (with the warning and Red frame around the results)

Interaction terms

Click this button to display the Interaction Terms dialog box, which contains options to select interactions between coded predictor variables.

Customize groups

  1. Select a Categorical variable in the Predictor variables pane.
  2. Click the Customize groups button to display the Customize Groups for a Categorical Variable dialog box, which contains options to create customized groupings for the currently selected predictor variable and the currently selected group type.
  3. Select a continuous variable in the Predictor variables pane.
  4. Click the Customize groups button to display the Custom split groups dialog box, which contains options to apply custom splits to the selected continuous predictor.

Show summary

Click this button to create a summary containing all graphs and solutions for the currently selected (highlighted in the Predictor variables pane) predictor variable.

The results will be compiled consistent with the chosen preferences in the Options dialog box - Output Manager tab; by default the results will be placed as Statistica results spreadsheets and graphs into a workbook.

Show all summary

Click this button to create a summary containing all graphs and solutions for all predictor variables.

The results will be compiled consistent with the chosen preferences in the Options dialog box - Output Manager tab. By default the results will be placed as Statistica results spreadsheets and graphs into a workbook.

Deploy to Enterprise

As described in the Introductory Overview, all code generated by the automated WoE coding module can be expressed in sequences of if {condition} then {assignment} elseif...endif blocks.

These transformations can be saved via this option to the Statistica Enterprise metadata repository to be referenced by rules nodes in workflows, macros, and so on.

By default, the analysis template that will be saved along with the transformation and recoding logic will generate the recoded predictor variables.

As a result of applying the final coding rules, by default new variables will be created following the convention WoE_OriginalName, (each recoded predictor variable will be named with the prefix WoE, followed by an underscore and the original variable name). The variable names can be edited in the Select the required variables for creating rules dialog box that is initially displayed when the Deploy to Enterprise button is clicked. Also, the original value interval boundaries for each recoded group will be attached to the respective WoE value.

Rules

Click this button to display the Select the required variables for creating rules dialog box, where you can review/edit the names of the coded predictors and review summary statistics (IV, Somer’s D, and so on) for each predictor variable.

Input variables to copy to output spreadsheet

Select this check box to display the selected variable(s) along with the Weight of Evidence coded variable(s) in a data file for further analysis.

Included variables

Click this button to display a standard variable selection dialog box, where you can select the variables to include with the Weight of Evidence coded variables.

Choose group type

For the currently selected predictor highlighted in the Predictor variables pane (description below), select the desired coding (either one of the default constrained solutions, the unconstrained solution, or a customized recoding solution) by either

  • clicking on the respective graph
  • selecting an option button in the Choose group type box.

    Only Custom and No restrictions grouping options are available for categorical predictors.

    The currently selected group type will be used for the respective variable as the basis for subsequently defined  customized groups or interaction terms.

Specifications the Results Panel

This panel is on the right hand side of the Weight of Evidence (WoE) dialog box.

Variables button

Click this button to select variables to begin a new project.

A standard variable selection dialog box displays,where you can select both continuous and categorical predictors, You must also select a binary dependent (Y) variable.

Note: The default characterization of variables as categorical or continuous can be modified via the options in the Options dialog box - Analyses/Graphs tab.

Dependent variable group box

The dependent or Y variable required for the WoE computations must be binary. The controls in this group box are used to select the specific codes that are to be treated as Bad or Good in the WoE computations as described in the Introductory Overview.

Bad Code:

Select the binary value which, based on history, can be predicted to have a bad outcome.

Good Code:

Select the binary value which, based on history, can be predicted to have a good outcome.

Predictor variables

In this group box, you can view details for each predictor and its diagnostic value for subsequent modeling. In order to review the detailed results for a specific predictor, select/highlight the respective variable in this pane by clicking on it.  To create customizations for a predictor variable in project mode, regroup or apply groups for that predictor variable.

The three buttons located at the top right of the box allow you to maximize this group box or to select variables to add or delete.

Add Variables button

Select new variables and click OK. This will compute groups and generate results for the new variables and finally update the existing predictors list to include these new variables.

Click Cancel from Variable Selection to go back to the original WoE dialog box.

Delete Predictors button

  1. Click the delete button (with a trash can image) to show a single list variable selection dialog box displaying only the predictor variables that were selected in the module.
  2. Select a list of predictors to remove from the module.
  3. Click OK.

    Predictor variables and Interactions can also be individually deleted from the module by hovering the mouse pointer to the left of the Predictor variables list. A trash can button will display.

Details that display for each predictor:  

The variable number of the respective predictor in the input file or data source (1 is the first variable in the input file, 2 the second, and so on)

Regroup

This option indicates how the module will compute statistics and bins for the predictor variables when you click  the Compute groups button.

This flag will be available for all predictor variables (Interactions are excluded).  Only selected variables will be affected.

The Regroup checkbox  contains a filter where current groups can be selected. If you select the Regroup checkbox beside one or more predictor variable(s), when you click the Compute Groups button, the analysis will recompute statistics and optimal binning from the data for the selected predictor variables. The previously saved customization will be lost for the predictor variable.

Unselecting the Regroup option will automatically unselect the Apply groups checkbox option and enable it.

A dropdown dialog will appear upon clicking the filter like button added next to Regroup and  column header name.

While computing groups, when you select a predictor to Regroup, the module will:

  • Determine the default optimal bins for the data, based on the CART or CHAID algorithm chosen in WoE Settings.
  • Internally compute cross tables and frequency tables of goods and bads based on the default bins.

Apply groups

This option indicates how the module will compute statistics and bins for the predictor variables when you click  the Compute groups button. This flag will be available for all predictor variables (Interactions are excluded)

On clicking the Compute groups button the module will compute statistics and apply the previously computed bins to the data for all the predictors that have Apply groups option checked.  A dropdown dialog will appear upon clicking the filter like button added next to Apply groups column header name. The previous customization will be retained for the predictor variable.

Note: When a Regroup checkbox is selected, the corresponding Apply groups checkbox will automatically be checked and disabled.

Name

The name of the respective predictor variable

Name, Coded

Type

Denotes if the respective predictor variable is categorical or continuous

IV

The Information Value for the respective predictor variable

Somer's D

KS

KS p value

Cramer's V

MD WoE

The currently specified WoE value to be used to recode missing data values for the respective predictor variable

MD Count

The number of missing data values for the specified predictor

ChiSq

ChiDq p value

F

F p value

Unbound Intervals filter

Unbounded Intervals will be tracked separately for each continuous variable. The checkbox is available only for continuous predictors. You  can check or uncheck this option and see the first last intervals change in group details and cross tabs on the WoE dialog. Creating new interactions involving continuous variables will use this flag. Saved interactions will retain the state of unbounded interval flag as is when it was created.

For continuous pred. make first/last intervals unbounded

Select this check box to generate rules such that the first and last intervals are not bounded by the respective minimum and maximum of the observed data, that is, unbounded.

If this check box is not selected, bounded first and last intervals will be generated.

For example, the data for a given variable are bounded between values xmin and xmax.

The unbounded first and last intervals are:

First interval : (-∞,Upper BoundaryFirst Interval]

Last interval : [Lower BoundaryLast Interval , ∞)

and the bounded first and last intervals are:

First interval :

 [xmin, Upper BoundaryFirst Interval]

Last interval : [Lower BoundaryLast Interval , xmax]

Selected Coding

Missing Data

The recoding generated by the automated WoE coding module will ensure that all observations for the respective predictor variables will be recoded. Use the controls in this box to specify what value to assign to the respective currently selected (highlighted in the Predictor variables pane) variable when it is recoded if the original input variable has missing data.

Default WoE

By default, you can assign the WoE value computed for missing data from the input data. If the input data (two-way interaction table) does not contain any missing data, then the default value will be WoE=0, that is, even odds (see also the Introductory Overview for details).

Manual WoE

Clear the Use default check box in order to specify a WoE value manually that will be used for the currently selected interaction table for cases with missing data.

Apply button

Click this button to apply your selections

Group details

After the best grouping solutions have been computed, the Group details pane will display detailed statistics for each predictor and each grouping for the current (default) solutions.

Results

As described in the Introductory Overview, the automated Weight of Evidence module will generate the best available recoding of continuous and categorical predictors that will maximize the WoE differences between groups, and between adjacent groups for continuous predictors.

The algorithms for deriving the default coding for continuous predictors lend themselves to generate default constrained solutions in addition to the best unconstrained grouping of values.

There are three types of constrained WoE recoding solutions that the program will compute, if those constrained solutions exist (If the constrained solutions cannot be identified (do not exist, given the default partitions), then a No Solution message is shown for the respective graph.):

Monotone

The best predictor coding subject to the LINEAR constraint on the WoE function  for each continuous predictor, where the WoE values of all adjacent recoded groups (intervals) will increase (positive monotone relationship of predictor values to WoE), or the WoE values of all adjacent recoded groups will always decrease (negative monotone relationship of predictor vales to WoE)

One minimum or maximum

The best predictor coding subject to the  QUADRATIC constraint on the WoE function  for each continuous predictor. It displays the quadratic solutions, where the relationship between the coded value ranges (intervals) to WoE values can have a single reversal, so that the resulting function is either U-shaped or inverse-U-shaped.

One minimum and one maximum

The best predictor coding subject to the CUBIC constraint on the WoE function  for each continuous predictor. It displays the cubic solutions, where the relationship between the coded value ranges (intervals) to WoE values can have two reversals, so that the resulting function is S-shaped.

Logodds plot

For the currently selected group type, the Logodds plot will show  the log-odd values for each (recoded) group in the current data set. It displays the percentages of observations in each respective coded category,  displayed next to the points in the graphs of the WoE functions.

Weight of Evidence Graphs

Custom

One WoE graph is created for each calculation method. The labels on the x axis denote the boundaries of the groups that were calculated with this method. Each point is labeled with the percent of cases found in this new grouping. It displays a custom solution created by the user for continuous and categorical predictors.

No restrictions

This graph displays the best unconstrained solution for continuous and categorical predictors.

Crosstabs/Frequency table

The Crosstabs/frequency table summarizes detailed statistics for each group, for the currently selected predictor (highlighted in the Predictor variables pane), and group type. Specifically, the table shows - for each recoded group - the number of Goods and Bads in each group, the Gini value for each group, the contribution of each group to the Information Value, the WoE (Weight of Evidence) value, and  the category boundaries used for recoding.

Note

Previous to 13.3, Open project had the following three options in a dropdown list, but they have been removed in this release and replaced with new options:

Edit (eqivalent to current Rebin)

Deploy (equivalent to current Apply bin)

View

The new options can now be applied to both saved and new datasets. Although the module now works in a single state, you can now use flags to indicate how the module will compute statistics and bins for the predictor variables when you click the Compute groups button.