Information Value

Calculates both the information value (IV) and weight of evidence (WOE) of attributes. These are measures of the overall "relevance" of a data variable in predicting the dependent column's desired value or outcome.

Information Value

Information at a Glance

Category Explore
Data source type DB
Sends output to other operators No
Data processing tool n/a

For more information about IV and WOE, see Information Value and Weight of Evidence Analysis.

Algorithm

The Information Value operator uses the following formulas for calculating IV and WOE:

Weight of Evidence = Ln(Distribution Good/Distribution Bad)*100

Information Value = sum((Distribution Good - Distribution Bad)*Ln(Distribution Good/Distribution Bad))

where Distribute Good refers to percentage of values, for each given independent variable grouping, that results in the desired "Value to Predict" for the dependent variable and Distribution Bad is the percentage of values within each grouping that is not the "Value to Predict."

The following table provides an example.

Attribute Count Goods Distribution Good Count Bads Distribution Bad WOE
Missing 1 10% 3 30% -109.9
true 3 30% 2 20% 40.55
false 6 60% 5 50% 18.23

Information Value = (10% - 30%)*Ln(10% / 30%) + (30% - 20%)*Ln(30% / 20%) + (60% - 50%)*Ln(60% / 50%) = 0.2785

Input

A data set from the preceding operator.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Dependent Column The column to use as the class variable.

Note: The Dependent Column must be a categorical (not continuous) variable.

Value to Predict The value stored in the Dependent Column that represents the event to analyze (for example, Active vs. Inactive).

The Value to Predict must be a value that exists for the Dependent Column. It is considered the "good" event.

Columns Columns to use to analyze the relevance of or effect on the Dependent Column value equaling the Value to Predict.

Click Select Columns to open the dialog box to select the available columns from the input data set for analysis. See Select Columns Dialog Box for more information.

Column names selected must be categorical values.

Output

Visual Output
The results display provides the IV and WOE for each selected independent variable, providing insight into how effective each variable is in predicting the desired dependent variable value.





Data Output
None. This is a terminal operator.