Regression Evaluator

This operator calculates several metrics for evaluating the regression models.

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter

Description
Category Model Validation
Data source type TIBCO® Data Virtualization
Send output to other operators No
Data processing tool TIBCO® DV, Apache Spark 3.2 or later

Algorithm

This operator computes several commonly used statistical tests to determine the accuracy of selected columns that contain predicted values. This operator implements the regression evaluator in Spark MLlib.

For model validation, the operator uses the Spark ML regression evaluator. You can use the model with the following evaluation metric.

Metric

Description Equation
Mean Squared Error (MSE) Sum of the squared differences between the actual and predicted columns, divided by the number of observations in the data set.

A value of 0 indicates that the predicted and actual values are exactly the same for each observation. A very high value indicates that the difference between actual and predicted values is very large.

Equation -  MSE.png
Root Mean Squared Error (MSE) Square root of the Mean Squared Error (MSE) metric. Equation - RMSE.png
Mean Absolute Error (MAE) Average of the absolute difference between the predicted and actual columns for each observation.

A value of 0 indicates that the predicted and actual values are exactly the same for each observation. A very high value indicates that the difference between actual and predicted values is very large.

Equation - MAE.png
Coefficient of Determination (R2) The proportion of the variance in the dependent variable that is predictable from the independent variables.

A value of 1 indicates that the regression line perfectly fits the data, while a value of 0 indicates that it doesn't fit the data at all. When a value is negative, it indicates that the horizontal line is better than the regression mode and does not capture the trend of the data.

Equation - Coeff of R.png

Where

  • RSS = sum of squares of residuals

  • TSS = total sum of squares

Explained Variance Returns the variance explained by the regression. For more information, see the Spark documentation. explained Variance equation

Input

An input is a single tabular data set and one or more TIBCO Data Virtualization model operators.

Bad or Missing Values
  • The operator accepts only the classification model.
  • The operator does not accept more than one data set.
  • Null values are not allowed and result in an error.
  • Two types of input (tabular data and at least one model object) must be connected to this operator to prevent errors.
  • The dependent variable should be in the input data set, or else the operator produces an error.

Configuration

The following table provides the configuration details for the Regression Evaluator operator.

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Actual Value Specify a numerical column that holds the dependent variable on which the models were used to train or a column of known values for the dependent variable. The column must be a numerical type column.
Output Schema Specify the schema for the output table or view.
Output Table Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results When set to Yes, the operator saves the results. If set to No, the operator does not save the results.

Output

Visual Output
Displays the performance of regression models. The Model column displays the name of the upstream operator. This enables you to differentiate between multiple instances of the same model.
Output to successive operators
A data table with model performances.

Example

The following example uses the crabs data set to build the Elastic-Net Linear Regression model and then evaluates the model and crabs data set with the Regression Evaluator operator.

Example Regression Evaluator operator workflow
Data
crabs: This data set contains the following information:
  • Multiple columns such as color, spine, width, satellts, weight, and catwidth.
  • Multiple rows (173 rows)
Parameter Setting
The parameter setting for the crabs data set are as follows:
  • Actual Value: satellts
  • Store Results: Yes

Output
The following figure displays the results for the parameter settings for the crabs data set.
Regression Evaluator operator - Output tab