Uses any input classification model to apply a classification prediction to the input data set.
Information at a Glance
Category
|
Predict
|
Data source type
|
HD
|
Sends output to other operators
|
Yes
|
Data processing tool
|
MapReduce
|
Note: The Classifier (HD) operator is for Hadoop data only. For database data, use the
Classifier (DB) operator.
Algorithm
The
Team Studio Classifier operator is used to predict the probability of the occurrence of the event based on the model generated by the training of Alpine Forest, Decision Tree, K-Means (Hadoop), Logistic Regression, Naive Bayes, Neural Network, or SVM Classification operator models.
Input
The input data set must contain the columns such that the names are the same as the columns in the data set selected for model training with the exception of the dependent column.
The Classifier operator must have both of the following.
- an input Classification model.
- an input data set against which the model is applied.
The model preceding the Classifier operator can be any of the following. The Classifier operator can take multiple models from the preceding operators, not just one.
- Alpine Forest
- Decision Tree
- K-Means
- Logistic Regression
- Naive Bayes,
- SVM Classification
Configuration
Parameter
|
Description
|
Notes
|
Any notes or helpful information about this operator's parameter settings. When you enter content in the
Notes field, a yellow asterisk is displayed on the operator.
|
Store Results?
|
Specifies whether to store the results.
- true - results are stored.
- false - the data set is passed to the next operator without storing.
|
Results Location
|
The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in
Results Name. Click
Choose File to open the
Hadoop File Explorer Dialog Box and browse to the storage location. Do not edit the text directly.
|
Results Name
|
The name of the file in which to store the results.
|
Overwrite
|
Specifies whether to delete existing data at that path and file name.
- Yes - if the path exists, delete that file and save the results.
- No - Fail if the path already exists.
|
Compression
|
Select the type of compression for the output.
Available Parquet compression options are the following.
- GZIP
- Deflate
- Snappy
- no compression
Available Avro compression options are the following.
- Deflate
- Snappy
- no compression
|
Output
- Visual Output
-
The Classifier outputs its prediction columns with the columns of the input data set into a prediction table location specified by user.
The data rows of the output table/view displayed (up to 2000 rows of the data).
For example, the output for a dependent column, srsdlqncy, might look like the following.
- Data Output
-
The Classifier operator outputs the following standardized three prediction columns:
- P_dependent_column_name: The predicted value which should be one of the possible returning values of the dependent column.
- C_dependent_column: The confidence of obtaining the result being the
P_dependent_column_name predicted value.
- C_dependent_column_details: The confidence values associated with the dependent column's possible values.
Note: If the Classifier operator has more than one input model, then the resulting output has the three prediction columns per input model, and the column names are prepended with the input model operator's name.
- Data Output
- Connect this operator to succeeding operators.
Example
Copyright © Cloud Software Group, Inc. All rights reserved.