Applies an input regression, classification, or clustering model to an input dataset in order to predict a value (or the highest probability value).
Information at a Glance
Category
|
Predict
|
Data source type
|
HD
|
Sends output to other operators
|
Yes
|
Data processing tool
|
MapReduce
|
The input column names must match the column names in the data set selected for model training, except for the dependent columns.
The prediction operation outputs its prediction columns with the columns of the input dataset into a user-specified prediction table.
The operator includes the following prediction columns in the user-specified output table.
- PRED_<model_abbreviation> - the predicted value or value with highest probability
- CONF_<model_abbreviation> - the confidence in the predicted value
- INFO_<model_abbreviation> - a dictionary of information about the results
Model Type
|
Model
|
Column Abbreviation
|
Classification
|
- Naive Bayes
- Logistic Regression
- SVM
- Alpine Forest Classification
- Decision Tree
|
|
Regression
|
- Linear Regression
- Alpine Forest Regression
|
|
Clustering
|
K-Means
|
KM
- PRED_KM - predicted cluster
- DIST_KM - distance to the center of the cluster
- INFO_KM - a dictionary of information about the results
|
Algorithm
The Predictor operator is used to predict the value of dependent variable based on the model(s) generated from the input model operator(s).
Input Model
|
What Predictor Calculates
|
Classification algorithms
|
Value with the highest probability
|
Numeric regression algorithms
|
Predicted value
|
Clustering algorithms
|
Predicted cluster
|
Input
An input regression, classification, or clustering model, and an input dataset against which the model is applied.
Configuration
Notes
|
Any notes or helpful information about this operator's parameter settings. When you enter content in the
Notes field, a yellow asterisk is displayed on the operator.
|
Store Results?
|
Specifies whether to store the results.
- true - results are stored.
- false - the data set is passed to the next operator without storing.
|
Results Location
|
The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in
Results Name. Click
Choose File to open the
Hadoop File Explorer Dialog Box and browse to the storage location. Do not edit the text directly.
|
Results Name
|
The name of the file in which to store the results.
|
Overwrite
|
Specifies whether to delete existing data at that path and file name.
- Yes - if the path exists, delete that file and save the results.
- No - Fail if the path already exists.
|
Compression
|
Select the type of compression for the output.
Available Parquet compression options are the following.
- GZIP
- Deflate
- Snappy
- no compression
Available Avro compression options are the following.
- Deflate
- Snappy
- no compression
|
Output
- Visual Output
- The data rows of the output table or view displayed (up to 2000 rows of the data).
Example
The following image shows an example input configuration.
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.