Information Gain

In decision tree analyses, an important concept is the gain ratio from a parent node to its children.

The gain ratio, ∆, measures the gain in purity from parent to children, weighted by the relative size of the subsets, as follows.

where I is the purity (or impurity) of a node;

N is the number of elements assigned to child node j;

N is the total number of elements at the parent node.

The decision tree algorithm tries to perform a splitting that maximizes this gain ratio.