Alternating Least Squares Method

Use the Alternating Least Squares model (A, B and D matrices) for incomplete matrices to predict and recommend over a subset of cases:

You can use a spreadsheet containing a subset of cases in a single column to generate predictions for all the entries, including those which are observed, into a spreadsheet for the subset of cases.  

You can generate top or bottom ’r;k’ column variable recommendations for the subset of cases into a spreadsheet.

For each user, the number of recommendations for each case will be equal to min(k, the number of column variables).

You can  specify whether to have the top or bottom k recommendations. The Default is top k.

You can specify the value of k. The default is k = 5.  

How the node works:

When you input a data set, the node checks to make sure it is correct. The node generates a compressed matrix according to your specifications.  It then Identifies whether a warm start has been selected. Next it computes singular values and model matrices (A and B) or singular vectors based on your specifications.

  • Fixed lambda Singular values and the model matrices or singular vectors will be generated based on a fixed lambda specified by the user.
  • Grid of lambda Singular values and the model matrices or singular values will be generated based on optimized lambda over a user specified grif of lambda
  • No Bisection Optimized lambda will be obtained strictly over the user specified lambda grid.
  • Bisection Optimized lambda will be obtained using the user specified lambda grid and also interpolating between grid points using bisection method.

Ouput

  • Display model output on user specified single or multiple spreadsheets
  • Plot for the singular values
  • An optional spreadsheet with prediction for the full matrix

Quick Tab

Matrix Type

Specify whether the matrix is sparse or incomplete.

Incomplete matrix

The algorithm will consider the missing entries of your input matrix as missing and will fit the model by iteratively imputing the missing entries.

Sparse matrix

The algorithm will consider the missing entries of your input matrix as zeros and will try to fit the model without imputing the missing entries.

Row indices

Specify the column in the input spreadsheet for storing row indices.

NOTE: Only one column from the input spreadsheet should be selected for the row indices.

Column indices

Specify the column in the input spreadsheet storing column indices.

NOTE: Only one column from the input spreadsheet should be selected for the column indices.

Values

Specify the column in the input spreadsheet storing the matrix values.

NOTE: Only one column from the input spreadsheet should be selected for the matrix values.

Row Sorted

Specify if the input matrix is row sorted in ascending order.

NOTE: Duplicate cell entries will be ignored only in case of row sorted data.

Rank

Specify if the input matrix is row sorted in ascending order.

NOTE: Duplicate cell entries will be ignored in case of row sorted data.

Lambda

Enter any value >=0.

NOT APPLICABLE IF LAMBDA GRID IS SELECTED.

Specify the penalizing parameter lambda.

Lambda Grid

Select whether or not to consider a grid of lambda values to fit the model.

APPLICABLE ONLY IF THE MATRIX TYPE IS INCOMPLETE.

NOTE: The node will produce the output model for the optional lambda.

Grid Start

    1. Enter any value > 0.
 APPLICABLE ONLY IF YES IS SELECTED FOR THE LAMBDA GRID AND THE MATRIX TYPE IS INCOMPLETE.
    1. Specify the starting point of the grid >0.
 The node automatically generates the maximum possible value for lambda as an end point of the grid.

Number of Grid Points

    1. Enter any integer >=2.
 APPLICABLE ONLY IF YES IS SELECTED FOR THE LAMBDA GRID AND THE MATRIX TYPE IS INCOMPLETE.
    1. Specify the number of lambda point to be considered on the grid.

Grid Search with Bisection

Select whether to run the node over the grid and move to bisection method if necessary.

APPLICABLE ONLY IF YES IS SELECTED FOR THE LAMBDA GRID AND THE MATRIX TYPE IS INCOMPLETE.

The specified grid can skip the optimal lambda for the specified rank. Selecting bisection will make sure the node finds the right model with the rank you specified.

If the bisection is set to NO, the node can end up with a model with a rank lower than what you specified,

Advanced Tab

Rule of Convergence

Choose one of the following to use  as a stopping rule for the computation:

  • Default Frobenius norm convergence
  • Relative MSE to check

Number of Data Chunks for multithreading

  • If the matrix is row-sorted, enter a perfect square interger >=1 and <=1024
  • If the matrix is not row-sorted, enter any interger  >=1 and <=1024

    Specify the number of chunks of the data to make for the multithreaded operations to run.  It must be a number whose square root produces an integer.

    NOTE: The node uses the input matrix geometry for splitting the matrix for multithreading. Switching options for Compressed R-LIke Matrix reshuffles the rows and columns of the matrix and creates a different matrix geometry without changing any of the matrix properties. Also, too many  chunks can slow down the computation.

Tolerance

Specify the tolerance level as a stopping rule for convergence.  Enter any value >0.

Maximum Number of Iterations

Choose the maximum number of iterations to let the internal iterations run. Enter an integer > = 1 and < = 1000.

NOTE: Entering a very low value can stop the computation before the convergence. Entering a very high value can cause the computation to take a long time to finish if there is a low specified value of lambda with respect to the rank.

Compressed R-Like Matrix

Standard compressed R-like form (sparse or incomplete matrix form in R) is a triple column format where the first column stores the observed numeric row indices, second column stores the observed numeric column indices and third column stores the corresponding observed values.

  • Select 0 based if the input matrix is in standard compressed R-like form with numeric indices starting from 0.
  • Select 1 based if the input matrix is in standard compressd R-like form with numeric indices starting from 1.
  • Select False for any other input format.

Warm Start Tab

Warm Start

Select whether or not to use the previously fitted model to initiate the model parameters for the current run of the computation.

APPLICABLE ONLY IF A MODEL NODE FROM A PERVIOUS RUN IS CONNECTED TO THIS NODE. Warm Start is not applicable if the Lambda Grid option has been selected.

Result Tab

Model Matrix type

An option set to Singular Vectors displays singular vectors (U,V) instead of (A =UD) and (B =VD) (D = diagonal matrix of the square root of the singular values).

Set the option to A and B matrices for model deployment in the Alternating Least Squares Deployment node.

Complete Missing Entries

Set this option to True if a complete matrix should be displayed with missing entries replaced by predictions from the model.

Only applicable for incomplete Matrices.

Multiple Model Spreadsheets

Select whether or not to output the model in a single spreadsheet or in a multiple spreadsheet.

Options Tab

Input settings

Requires input

Select this check box to require input

Max inputs

Use the mini scroll to select the maximum number of inputs to allow.

Allow downstream connections

Select this check box to allow downstream connections