Predictor (HD)

Applies an input regression, classification, or clustering model to an input dataset in order to predict a value (or the highest probability value).

Information at a Glance

Category	Predict
Data source type	HD
Sends output to other operators	Yes
Data processing tool	MapReduce

The input column names must match the column names in the data set selected for model training, except for the dependent columns.

The prediction operation outputs its prediction columns with the columns of the input dataset into a user-specified prediction table.

The operator includes the following prediction columns in the user-specified output table.

PRED_<model_abbreviation> - the predicted value or value with highest probability
CONF_<model_abbreviation> - the confidence in the predicted value
INFO_<model_abbreviation> - a dictionary of information about the results

Model Type	Model	Column Abbreviation
Classification	Naive Bayes Logistic Regression SVM Alpine Forest Classification Decision Tree	NB LOR SVM AFC DT
Regression	Linear Regression Alpine Forest Regression	LIR AFR
Clustering	K-Means	KM PRED_KM - predicted cluster DIST_KM - distance to the center of the cluster INFO_KM - a dictionary of information about the results

The Predictor operator is used to predict the value of dependent variable based on the model(s) generated from the input model operator(s).

Input Model	What Predictor Calculates
Classification algorithms	Value with the highest probability
Numeric regression algorithms	Predicted value
Clustering algorithms	Predicted cluster

An input regression, classification, or clustering model, and an input dataset against which the model is applied.

Notes	Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Store Results?	Specifies whether to store the results. true - results are stored. false - the data set is passed to the next operator without storing.
Results Location	The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer Dialog Box and browse to the storage location. Do not edit the text directly.
Results Name	The name of the file in which to store the results.
Overwrite	Specifies whether to delete existing data at that path and file name. Yes - if the path exists, delete that file and save the results. No - Fail if the path already exists.
Compression	Select the type of compression for the output. Available Parquet compression options are the following. GZIP Deflate Snappy no compression Available Avro compression options are the following. Deflate Snappy no compression