Regression Evaluator (HD)

Computes several commonly used statistical tests to determine the accuracy of several columns (Predicted Values). These represent predictions against one column (Actual Value) which is specified as the "ground truth."

Information at a Glance

Category Model Validation
Data source type HD
Sends output to other operators No
Data processing tool n/a
Note: The Regression Evaluator (HD) operator is for Hadoop data only. For database data, use the Regression Evaluator (DB) operator.

For information about the metrics used in this operator, see Computed Metrics and Use Case for the Regression Evaluator.

Input

A tabular data set from Hadoop that contains a numeric column of actual values (known truth) and numeric column(s) of predicted values.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Actual Value A numeric column that holds the dependent variable that the models were used to train on, or a column of known values for the dependent variable.
Predicted Values (to Compare with Actual Value) A set of numeric column(s) whose results predict the model. For example, if you are using this to evaluate several different linear regressions, the predicted values for each of the regressions is selected here.
Write Rows Removed Due To Null Data To File Rows with null values are removed from the analysis. This allows you to specify that the data with null values be written to a file.

The file is written to the same directory as the rest of the output. The filename is suffixed with _baddata.

  • Do Not Write Null Rows to File - remove null value data and display in the result UI, but do not write to an external file.
  • Do Not Write or Count Null Rows (Fastest) - remove null value data but do not count and display in the result UI.
  • Write All Null Rows to File - remove null value data and write all removed rows to an external file.
Storage Format Select the format in which to store the results. The storage format is determined by your type of operator.

Typical formats are Avro, CSV, TSV, or Parquet.

Compression Select the type of compression for the output.
Available Parquet compression options.
  • GZIP
  • Deflate
  • Snappy
  • no compression

Available Avro compression options.

  • Deflate
  • Snappy
  • no compression
Output Directory The location to store the output files.
Output Name The name to contain the results.
Overwrite Output Specifies whether to delete existing data at that path.
  • Yes - if the path exists, delete that file and save the results.
  • No - fail if the path already exists.
Advanced Spark Settings Automatic Optimization
  • Yes specifies using the default Spark optimization settings.
  • No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Output

Visual Output
A table of metrics about each of the predicted columns.



Data Output to Succeeding Operators
None. This is a terminal operator.

Example