Decision Tree Output Troubleshooting

When you analyze the results of a Decision Tree Model, keep in mind the most important assessment factors.

Assessing a Decision Tree

When analyzing the results of a Decision Tree Model, the most important assessment factors are as follows.

It scores well outside of the training data (indicated by the ROC, LIFT, and Goodness of Fit Scoring Operators).
It avoids the single-node, repetition, and unbalanced tree scenarios.
It is easily interpretable.
It makes sense intuitively.

There are a few Decision Tree Operator output results that would make a modeler question the validity of the Decision Tree model, including single-tree nodes, repetition, and replication.

Single-node trees, repetition, or replication conditions might indicate to the modeler that a different form of modeling should be used instead of Decision Trees.

Single Tree Nodes

For the extreme case of data that does not have any sort of classification structure to it, the Decision Tree output would simply be a single-node tree that assigns elements the label of the most frequently occurring class.

Repetition

Another anomaly that might be seen for Decision Tree output is a tree with repetition, whereby an attribute is repeatedly tested along a given branch of a tree. For example, the following tree shows the attribute age repeatedly being compared to different threshold values down the tree.

Replication

Another problem to watch out for is when replication or duplicate sub-trees exist within a tree. This issue could be potentially solved by creating combination variables, or it could be an acceptable model with the replication.

Related concepts

Decision Tree and CART Operator General Principles

Decision Tree Use Case

Contents

Index

Search Results

Decision Tree Output Troubleshooting

Assessing a Decision Tree