ROC

Generates a Receiver Operating Characteristic (ROC), or ROC curve.

Information at a Glance

Category Model Validation
Data source type HD, DB
Sends output to other operators No
Data processing tool MapReduce

The ROC curve is used to verify and compare the trained model(s) passed from a preceding model operator or operators by applying the algorithm on the data set passed from a preceding operator. The ROC-AUC method considers the coordinate pairing of the false positive rate (FP) and the true positive rate (TP). This set of coordinates forms the Receiver Operating Characteristic (ROC) curve.

The value of the ROC curve can be summarized by calculating the Area Under the ROC curve (AUC).

A random model typically has an ROC curve running along the diagonal. A better model curves to the upper left-hand side, thus having an AUC value approaching one.

This operator can be applied, in general, to any classification model (for example, CART, Decision Tree, Logistic Regression, Naive Bayes, Neural Network and Alpine Forest Classification).

Input

  • A data set from the preceding operator.
  • One or more model(s) from the preceding operator(s). This input is optional on database.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Dependent Column Define the column used as the class variable. (Not present in Hadoop)
Value to Predict The value represents the event to analyze.
Note: The value of this column must match the data as it is stored in the database, which matches how it is displayed in the data explorer. For example, consider a column that contains Boolean values.
  • If the Dependent Column represents the Boolean values as 1 and 0, then for the Value to Predict, the user must also use 1 or 0.
  • If the Dependent Column represents the Boolean values as True and False, then for the Value to Predict, the user must also use True or False.
Use Model Specifies whether the evaluation should use a model from its preceding operator(s), or if it should use the data in the prediction columns of the input data set.
  • If true (the default), at least one model operator must directly precede it.
  • If false, the prediction columns must be present in the input data set from its preceding operator.

(Not present in Hadoop)

Confidence Columns Specifies the list of columns in the input data set to compare to the Dependent Column. (Not present in Hadoop)

Output

Visual Output
A ROC-AUC diagram.

Date Output
None. This is a terminal operator.