Confusion Matrix

This operator displays information about the actual versus predicted counts of a classification model and helps assess the model's accuracy for each of the possible class values.

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter

Description
Category Model Validation
Data source type TIBCO® Data Virtualization
Send output to other operators No
Data processing tool TIBCO® DV, Apache Spark 3.2 or later

Algorithm

The Confusion Matrix operator is a classification model used to evaluate the accuracy of the predicted classifications of any classification modeling algorithm. For more information, see the Confusion Matrix.

This operator takes one or more classification model objects and an input data set from upstream. It applies each of the model objects to the input data and computes the confusion matrix. The model performance is evaluated using the count of true positives, true negatives, false positives, and false negatives in a matrix.

Input

An input is a single tabular data set and one or more TIBCO Data Virtualization model operators.

Bad or Missing Values
  • The operator accepts only the classification model.
  • The operator does not accept more than one data set.
  • Null values are not allowed and result in an error.
  • Two types of input (tabular data and at least one model object) must be connected to this operator to prevent errors.
  • The dependent variable should be in the input data set, or else the operator produces an error.

Configuration

The following table provides the configuration details for the Confusion Matrix operator.

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Output Schema Specify the schema for the output table or view.
Output Table Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results When set to Yes, the operator saves the results. If set to No, the operator does not save the results.

Output

Visual Output

A table that displays the output for each upstream model operator.

Data Output

None. This is a terminal operator.

Example

The following example uses the golf train data set to build the Naive Bayes model and then evaluates the model and golf test data set with the Confusion Matrix operator.

Confusion Matrix operator workflow

Data

golf train: This data set contains the following information:

  • Multiple columns namely outlook, temperature, wind, humidity, and play.
  • Multiple rows (14 rows).

golf test: This data set contains the following information:

  • Multiple columns namely outlook, temperature, wind, humidity, and play.
  • Multiple rows (14 rows).

Parameter Setting

The parameter settings for the given data set is as follows:

  • Store Results: Yes

Results

The following figure displays the results for the parameter settings for the given data set.

Confusion Matrix operator - Output tab