Predictor
This operator applies an input model (for example, regression, classification, or clustering) to an input data set in order to predict a target value.
Information at a Glance
Parameter |
Description |
---|---|
Category | Predict |
Data source type | TIBCO® Data Virtualization |
Send output to other operators | Yes |
Data processing tool | TIBCO® DV, Apache Spark 3.2 or later |
Algorithm
The Predictor operator is used to generate predictions based on the model(s) developed from the input model operator(s).
Input Model | What the Predictor Calculates |
---|---|
Classification algorithms | Class with the highest probability |
Numeric regression algorithms | Predicted value |
Clustering algorithms | Predicted cluster |
Anomaly detection algorithms | Anomaly class |
PCA | Principal components |
This operator takes one or more model objects and an input data set from upstream. Then it applies each model object to the input data and returns the prediction. Depending on the model types, the Predictor operator generates different prediction columns. For each additional input model, an index number is added to generated column names to separate them.
The operator includes the following models and prediction columns in the user-specified output table.
Model Type | Model | Model Abbreviation (key) | Prediction Columns |
---|---|---|---|
Classification |
Naive Bayes | NB |
|
Elastic-Net Logistic Regression | LOR | ||
Random Forest Classification | RFC | ||
Gradient-Boosted Classification | GBTC | ||
Regression |
Elastic-Net Linear Regression |
LR |
PRED_<key>: The value predicted by the regression model. |
Random Forest Regression |
RFR |
||
Gradient-Boosted Regression |
GBR |
||
Clustering | K-Means Clustering | KM |
|
Principal Component Analysis | Principal Component Analysis | PCA |
y_i_PCA: The ith number of principal components (starting from zero). |
Anomaly Detection | Isolation Forest | ISF |
|
Input
One or more input TIBCO Data Virtualization modeling operators (for example, regression, classification, or clustering) and one input data set against which the models are applied.
This operator is limited by the cluster resources and Spark data frame size.
Bad or Missing Values
-
Null values are not allowed and result in an error.
-
If the input column names do not match the column names in the data set selected for model training, an error is reported.
-
Input data, tabular data, and at least one model object must be connected to this operator, or else results in an error.
-
The dependent variable should be in the input data set, or else the operator produces an error.
Configuration
The following table provides the configuration details for the Predictor operator.
Parameter | Description |
---|---|
Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
Output Schema | Specify the schema for the output table or view. |
Output Table | Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator. |
Store Results | When set to Yes, the operator saves the results. If set to No, the operator does not save the results. |
Output
- Output: Displays a table of the predicted data set.
- Summary: Displays a list of the TIBCO DV modeling operators and their selected columns.
A table output that can be used by the downstream operator.
Example
The following example builds a Naive Bayes model and a Gradient-Boosted Tree Classification model, then combines the models with the Predictor operator.
golf: This data set contains the following information:
- Multiple columns namely outlook, temperature, wind, humidity, and play.
- Multiple rows (14 rows).
Parameter Setting
The parameter settings for the golf data set are as follows:
-
Store Results: Yes
These figures displays the results for the parameter settings for the golf data set.
Output
Summary