Pruning or Pre-Pruning

A key concept related to decision trees is pruning or pre-pruning, whereby branches are eliminated from the tree because they do not add enough informational relevance to the model.

Pruning and pre-pruning helps with avoiding over-fitting of the decision tree and makes the tree more compact and easier to read. For a typical dataset, both should be used, unless the algorithm takes too long to run.

The process of pruning involves going through each non-leaf node and determining, based on a confidence value, whether to turn the node into a leaf. In other words, it decides if the sub-tree adds enough extra value to the model. For pruning, the whole tree is built out, and then sub-branches are cut out if they are found to not be good predictors.

The process of pre-pruning limits the decision tree as it is being built based on increases in purity (that is, increase in information gain for the Decision Tree operator and improvement in Gini coefficient for the CART operator). This is a faster process than post pruning, but sometimes can result in too small of a decision tree.

Note: It is standard practice to have pre-pruning and post-pruning interleaved for a combined approach. Post-pruning requires more computation than pre-pruning, yet generally leads to a more reliable tree. No single pruning method has been found to be superior above all others (Data Mining: Concepts & Techniques, by Jiawei Han, Micheline Kamber, Jian Pei, page 346).

Pre-pruning is the cheapest way to keep the model small (and to help prevent over-fitting).