Goodness of Fit

Verifies a trained model.

Information at a Glance

Category Model Validation
Data source type DB, HD
Sends output to other operators No
Data processing tool MapReduce

This operator applies the trained model on the input data set, and then calculates the following for the model.

  • Precision
  • Recall
  • F1
  • Sensitivity
  • Specificity
  • Accuracy

It applies in general to classification models.

Algorithm

When both columns being compared are categorical (or levels), the natural summary is a contingency table - a simple matrix of counts of how often each combination of categories or levels is seen. For example, a result could look like the following illustration.



From this table, all of the traditional Goodness of Fit statistics (Precision, Recall, F1, Sensitivity, Specificity, and Accuracy) can immediately be read off, as shown in the following table.

Score Formula
Precision TP/(TP + FP)
Recall TP/(TP + FN)
F1 2 (Precision x Recall)/(Precision + Recall)
Sensitivity TP/(TP + FN)
Specificity TN/(FP + TN)
Accuracy (TP + TN)/(TP + FP + TN + FN)

Input

  1. A data set from a preceding operator.
  2. Model(s) from preceding operators. If more than one model is received from its preceding operators, the result can be used for model comparison. This input is optional on a database data source.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Dependent Column The column to use as the dependent variable in the model.
Use Model (Not present on Hadoop) Specify whether the evaluation uses a model from a preceding operator(s) or the data in the prediction columns of the input data set.

If true (the default), at least one model operator must directly precede it. If false, the prediction columns must be present in the input data set from its preceding operator.

Prediction Columns (Not present on Hadoop) Choose the list of columns in the input data set to compare to the dependent column.

Output

Visual Output
A Goodness of Fit result table that shows the primary Goodness of Fit statistics of each model.


Data Output
None. This is a terminal operator.