Information Value
Calculates both the information value (IV) and weight of evidence (WOE) of attributes. These are measures of the overall "relevance" of a data variable in predicting the dependent column's desired value or outcome.
Information at a Glance
Parameter |
Description |
---|---|
Category | Explore |
Data source type | DB |
Send output to other operators | No |
Data processing tool | n/a |
For more information about IV and WOE, see Information Value and Weight of Evidence Analysis.
Algorithm
The Information Value operator uses the following formulas for calculating IV and WOE:
Weight of Evidence =
Ln(Distribution Good/Distribution Bad)*100
Information Value =
sum((Distribution Good - Distribution Bad)*Ln(Distribution Good/Distribution Bad))
where Distribute Good
refers to percentage of values, for each given independent variable grouping, that results in the desired "Value to Predict" for the dependent variable and
Distribution Bad
is the percentage of values within each grouping that is not the "Value to Predict."
The following table provides an example.
Attribute | Count Goods | Distribution Good | Count Bads | Distribution Bad | WOE |
---|---|---|---|---|---|
Missing | 1 | 10% | 3 | 30% | -109.9 |
true | 3 | 30% | 2 | 20% | 40.55 |
false | 6 | 60% | 5 | 50% | 18.23 |
Information Value =
(10% - 30%)*Ln(10% / 30%) + (30% - 20%)*Ln(30% / 20%) + (60% - 50%)*Ln(60% / 50%) = 0.2785
Input
A data set from the preceding operator.
Configuration
Parameter | Description |
---|---|
Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
Dependent Column | The column to use as the class variable.
Note: The Dependent Column must be a categorical (not continuous) variable. |
Value to Predict | The value stored in the
Dependent Column that represents the event to analyze (for example, Active vs. Inactive).
The Value to Predict must be a value that exists for the Dependent Column. It is considered the "good" event. |
Columns | Columns to use to analyze the relevance of or effect on the
Dependent Column value equaling the
Value to Predict.
Click Select Columns to open the dialog to select the available columns from the input data set for analysis. See Select Columns dialog for more information. Column names selected must be categorical values. |