Goodness of Fit
Verifies a trained model.
Information at a Glance
|
Parameter |
Description |
|---|---|
| Category | Model Validation |
| Data source type | DB, HD |
| Send output to other operators | No |
| Data processing tool | MapReduce |
This operator applies the trained model on the input data set, and then calculates the following for the model.
- Precision
- Recall
- F1
- Sensitivity
- Specificity
- Accuracy
It applies in general to classification models.
Algorithm
When both columns being compared are categorical (or levels), the natural summary is a contingency table - a simple matrix of counts of how often each combination of categories or levels is seen. For example, a result could look like the following illustration.
From this table, all of the traditional Goodness of Fit statistics (Precision, Recall, F1, Sensitivity, Specificity, and Accuracy) can immediately be read off, as shown in the following table.
| Score | Formula |
|---|---|
| Precision | TP/(TP + FP) |
| Recall | TP/(TP + FN) |
| F1 | 2 (Precision x Recall)/(Precision + Recall) |
| Sensitivity | TP/(TP + FN) |
| Specificity | TN/(FP + TN) |
| Accuracy | (TP + TN)/(TP + FP + TN + FN) |
Input
- A data set from a preceding operator.
- Model(s) from preceding operators. If more than one model is received from its preceding operators, the result can be used for model comparison. This input is optional on a database data source.
Configuration
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Dependent Column | The column to use as the dependent variable in the model. |
| Use Model | (Not present on Hadoop) Specify whether the evaluation uses a model from a preceding operator(s) or the data in the prediction columns of the input data set.
If true (the default), at least one model operator must directly precede it. If false, the prediction columns must be present in the input data set from its preceding operator. |
| Prediction Columns | (Not present on Hadoop) Choose the list of columns in the input data set to compare to the dependent column. |
Output