Predictor (HD)

Applies an input regression, classification, or clustering model to an input dataset in order to predict a value (or the highest probability value).

Information at a Glance

Parameter	Description
Category	Predict
Data source type	HD
Send output to other operators	Yes
Data processing tool	MapReduce

The input column names must match the column names in the data set selected for model training, except for the dependent columns.

The prediction operation outputs its prediction columns with the columns of the input dataset into a user-specified prediction table.

The operator includes the following prediction columns in the user-specified output table.

PRED_<model_abbreviation> - the predicted value or value with highest probability
CONF_<model_abbreviation> - the confidence in the predicted value
INFO_<model_abbreviation> - a dictionary of information about the results

Model Type	Model	Column Abbreviation
Classification	Naive Bayes Logistic Regression SVM Alpine Forest Classification Decision Tree	NB LOR SVM AFC DT
Regression	Linear Regression Alpine Forest Regression	LIR AFR
Clustering	K-Means	KM PRED_KM - predicted cluster DIST_KM - distance to the center of the cluster INFO_KM - a dictionary of information about the results

Algorithm

The Predictor operator is used to predict the value of dependent variable based on the model(s) generated from the input model operator(s).

Input Model	What Predictor Calculates
Classification algorithms	Value with the highest probability
Numeric regression algorithms	Predicted value
Clustering algorithms	Predicted cluster

Input

An input regression, classification, or clustering model, and an input dataset against which the model is applied.

Configuration

Notes

Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.

Store Results?	Specifies whether to store the results. true - results are stored. false - the data set is passed to the next operator without storing.
Results Location	The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer dialog and browse to the storage location. Do not edit the text directly.
Results Name	The name of the file in which to store the results.
Overwrite	Specifies whether to delete existing data at that path and file name. Yes - if the path exists, delete that file and save the results. No - Fail if the path already exists.

Compression

Select the type of compression for the output.

Available Parquet compression options are the following.

GZIP
Deflate
Snappy
no compression

Available Avro compression options are the following.

Deflate
Snappy
no compression

Output

Visual Output

The data rows of the output table or view displayed (up to 2000 rows of the data).

Example

The following image shows an example input configuration.

Did you find this helpful?

Yes No