Alpine Forest Evaluator

Provides model accuracy data, a confusion matrix heat map that illustrates the classification model's accuracy for each possible predicted value, and an error convergence rate graph.

Information at a Glance

Category	Model Validation
Data source type	HD
Sends output to other operators	No
Data processing tool	MapReduce

The Alpine Forest Evaluator operator is an Alpine Forest model evaluation operator, similar to the Goodness of Fit evaluator, but more graphic in nature.

Note: The Alpine Forest Evaluator operator works for Alpine Forest Classification flows only.

Algorithm

The Alpine Forest Evaluator operator is used to evaluate the accuracy of the predicted classifications of any Alpine Forest operator algorithm. Performance of the model is evaluated using the count of true positives, true negatives, false positives, and false negatives in a matrix. The following table shows the confusion matrix for a two-class classifier.

		Predicted		`a` is the number of correct predictions that an instance is negative, `b` is the number of incorrect predictions that an instance is positive,
		Negative	Positive
Actual	Negative	`a`	`b`	`c` is the number of incorrect predictions that an instance is negative, and `d` is the number of correct predictions that an instance is positive.
	Positive	`c`	`d`

Like the Confusion Matrix operator, the Alpine Forest Evaluator operator also calculates several standard accuracy terms (See Confusion Matrix for more details).

The accuracy (AC) is the proportion of the total number of predictions that were correct.
The recall or true positive rate (TP) is the proportion of positive cases that were correctly identified.
The false positive rate (FP) is the proportion of negative cases that were incorrectly classified as positive.
The true negative rate (TN) is defined as the proportion of negative cases that were classified correctly.
The false negative rate (FN) is the proportion of positive cases that were incorrectly classified as negative.
Finally, precision (P) is the proportion of the predicted positive cases that were correct.

The Alpine Forest Evaluator also calculates an error rate in order to generate the error convergence graph. This indicates the misclassification rate of the model, which is simply 1 - accuracy (AC). This can also be computed as:

Error Rate = error rate formula

Input

The Alpine Forest Classification operator
A data set from a preceding operator

Restrictions

The Alpine Forest Evaluator operator must be connected to the Alpine Forest Classification operator and a data set from a preceding operator.

Configuration

Parameter	Description
Notes	Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.

Output

Visual Output

The Alpine Forest Evaluator operator produces classification accuracy data, a graphical heat map, and a graphical tracking of the number of decision trees vs. error rates.

Classification Accuracy Data Output

The data output provides the classification accuracy counts for every Observed/Predicted combination for each class. In the following example, the intersection of the Observed (0) row and Predicted (0) column indicates that 47,714 predictions of value 0 were correct, while the Observed (0)/Predicted (1) cell indicates the model predicted 1 instead of 0 only 7 times. So for predicting the class of 0, the class recall was 99.98% correct.

However, the Observed (1)/Predicted (0) cell indicates 1,028 instances of the model incorrectly predicting 0 for actual values of I and the Observed (1)/Predicted (1) cell indicates the model predicted 1 correctly 1,251 times, giving an accuracy (class recall) for predicting 1 of 54.89%.

Overall, the Alpine Forest Classification model's accuracy, in this example, is calculated to be 97.93% accurate.

Heat Map

A Confusion Matrix heat map displays information about actual versus predicted counts of a classification model.

The following example shows a Confusion Matrix heat map for an Alpine Forest model. In this case, it is evident that the model performs best when predicting the value 0 with 100% accuracy. However, the accuracy drops for predicting the value 1, being correct only 49% of the time.

Number of Trees vs. Error Rate Graph

Displays the error rate as the number of trees increases. This illustrates the Alpine Forest model's error convergence rate.

The following example illustrates how such a graph is useful for determining the number of decision trees needed in the Alpine Forest model in order for the error rate to be at its lowest. In this case, once there are nine decision trees, the error rate falls to 0.00%.