Collaborative Filter Predictor

Outputs predicted ratings for products using the Collaborative Filtering model created by the trainer. Uses both the model trained by the Collaborative Filter Trainer and a data set.

Information at a Glance

Category Predict
Data source type HD
Sends output to other operators Yes
Data processing tool Spark

Input

Output from a Collaborative Filter Trainer operator and a tabular data set with a column of user IDs and product IDs.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Users Column Column in the data set that contains user IDs.
Products Column Column in the data set that contains product IDs.
Output Directory The location to store the output files.
Output Name The name to contain the results.
Overwrite Output Specifies whether to delete existing data at that path.
  • Yes - if the path exists, delete that file and save the results.
  • No - fail if the path already exists.
Storage Format Select the format in which to store the results. The storage format is determined by your type of operator.

Typical formats are Avro, CSV, TSV, or Parquet.

Compression Select the type of compression for the output.
Available Parquet compression options.
  • GZIP
  • Deflate
  • Snappy
  • no compression

Available Avro compression options.

  • Deflate
  • Snappy
  • no compression
Advanced Spark Settings Automatic Optimization
  • Yes specifies using the default Spark optimization settings.
  • No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Output

Visual Output



Data Output
An HDFS data set with an extra column that describes the predicted rating for each product/user.
Example