Weight of Evidence (WoE) example
This example illustrates how the Weight of Evidence (WoE) module can be used in an analysis project for risk assessment. Input a set of predictor variables into the analysis to find optimal coding for both continuous and categorical variables. Their resulting weight of evidence can be used as continuous inputs for Logistic Regression, improving that model’s performance.
- Data file: CreditRisk.sta
- Variable of interest: Credit Standing: Good or Bad
- Goal of the analysis project: To classify credit applicants in terms of their Credit Standing
To distinguish between Good and Bad Credit Standing, you will use several independent (or predictor) variables, including the following:
A combination of the independent predictor variables may help to explain Credit Standing. They can be used to build a predictive model to classify new customers.
Before building such a model, use the Weight of Evidence tool to
You can then use the WoE values as continuous predictors for the logistic regression model.
Open the CreditRisk.sta data set
1 | Select the File tab. | The File screen displays with a menu down the left side. |
2 | Click Open Examples in the left hand menu. | The Open a Statistica Data File dialog box displays. |
3 | Open the Datasets folder. | The datasets located in the Datasets folder display. |
4 | Double-click CreditRisk.sta spreadsheet. | The spreadsheet displays. |
Start the Weight of Evidence module
Select the Home tab. | The Home tab ribbon opens, displaying the File, Output, Tools, SharePoint and Windows groups. | |
1 | Select the Data Mining tab on the ribbon. | The DataMining options display on the ribbon. |
2 | In the Tools group, click Weight of Evidence. | The Weight of Evidence (WoE) dialog box displays. |
3 | In the Specifications and Results Panel (top), click the Variables button. |
The Select the variables for the analysis dialog box displays. |
4 | Select the Show appropriate variables only check box. | The selection of variables displayed changes, as this option filters the variable lists according to their
Measurement Type.
For more information see Select Variables. |
5 | Select the following variables: | |
6 | Click the OK button. | The Variables dialog box will close and two areas of the WoE dialog box will populate: |
7 | Double-click in the Bad Code field. | The Values/Stats dialog box displays. |
8 | Select Bad and click OK. | The Bad Code and Good Code fields will update. |
Compute groups
Weight of Evidence Graphs
The following screen shot shows the WoE graph for the Custom method for the variable Age.
This plot shows:
- Age across the x axis. The labels on the x axis denote the boundaries of the groups that were calculated with this method.
- WoE on the y axis
Each point is labeled with the percent of cases found in this grouping.
This plot shows four groups of average ages:
A small category, from Age 23.5 to 24.5, represents only 4% of cases and has a much lower WoE than any of the other groups. It's WoE is also much lower than any of the others.
Why is the Weight of Evidence so different for this group?
Two possible scenarios could cause these results:
- Customers who are 24 years old might really have a very different Credit Standing than customers in the other age categories.
- By random chance, a greater number of Bad Credit Standing customers might just happen to be present in this sample.
Look at the WoE from a different perspective to see which scenario is most likely true.
Use the Monotone method to find a different solution.
Creating custom groups
Next, explore the variable Checking Acct.
1 | Click the icon the to left of the Predictor variables header to enlarge that pane. |
The Predictor variables pane will expand to display more of the list. In this example, all variables will display.. |
2 | Select Checking Acct, and click the icon by the header again to display the pane at its smaller size. |
The graphs and output in the Weight of Evidence (WoE) dialog box will update for the selected variable. Since Checking Acct is a categorical variable, some methods are not valid and their panes display No Solution. The Custom solution and the No restrictions solution are both shown. In the Group details pane (top right), you can see that the Custom solution does not combine any of the Checking Acct groups and they are all listed separately. The No Restrictions method groups Low and High are the only ones grouped. The others remain in individual groups. For ease of use, two groups, No Acct or Any Acct, encompassing 0Balance, Low, and High, would work better. |
3 | In the Control Panel, in the Choose group type box, select the Custom option. | The Custom graph is highlighted. |
4 | Then, click the Customize groups button up in the main Control Panel. | The Customize Groups for a Categorical Variable dialog box displays. |
5 | Select 0Balance, Low, and High. | |
6 | Click the Group button. | Notice how the Custom WoE graph changes. |
7 | Click OK. | The Customize Groups dialog box closes. The graphs and Group details group update. |
8 | In the Control Panel, click the Show Summary button. | A list will display. |
9 | Select All coding. | The workbook updates. Under
Summary for Checking Account, the
Custom Crosstabulation for Checking Acct output shows information about the new grouping of this variable.
The overall Information Value is 0.596, which means that this variable is a strong predictor of Credit Status. Even with customizing the split, this variable can still contribute significantly to the final logistic regression model. |
Deployment via Enterprise
Note: The remainder of this example can be followed only by those who have the Statistica Decisioning Platform software.
If you are happy with the remaining default groupings, the solution is ready for deployment.
Deployment via the Workspace
Additionally, Weight of Evidence Rules can be deployed in the Statistica Workspace. The rules must first be saved as an *.srx file.