Collaborative Filter Recommender

Using the model trained by the Collaborative Filter Trainer, outputs recommendations for those users or products.

Information at a Glance

Category Predict
Data source type HD
Sends output to other operators Yes
Data processing tool Spark

Input

The output from a Collaborative Filter Trainer, as well as an HDFS data set.

Restrictions

The column selected must represent either user IDs or product IDs.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Generate Recommendations for Select a column in your data set that represents either users or product IDs.

To see a list of products that a user might like, choose User (the default).

To see a list of users who might like a product, choose Product.

This Column Represents Indicates what the Generate Recommendations for column represents - Users (the default) or Products.
Number to Recommend Specify the number of recommendations to generate.

Range: 1-100.

Default value: 5.

Output Directory The location to store the output files.
Output Name The name to contain the results.
Overwrite Output Specifies whether to delete existing data at that path.
  • Yes - if the path exists, delete that file and save the results.
  • No - fail if the path already exists.
Storage Format Select the format in which to store the results. The storage format is determined by your type of operator.

Typical formats are Avro, CSV, TSV, or Parquet.

Compression Select the type of compression for the output.
Available Parquet compression options.
  • GZIP
  • Deflate
  • Snappy
  • no compression

Available Avro compression options.

  • Deflate
  • Snappy
  • no compression
Advanced Spark Settings Automatic Optimization
  • Yes specifies using the default Spark optimization settings.
  • No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Output

Visual Output

This output shows each user ID in the users column selected, then lists five products they might enjoy and their predicted ratings.



Data Output
This data can be further manipulated by other operators in your workflow. It is passed on as a tabular HDFS data set. You can find the storage location of the recommendation table by referring to the Summary section of the results pane, as shown in the following example.



Example