Partial Least Squares (PLS) Overview - Computational Approach

Basic Model
As in multiple linear regression, the main purpose of partial least squares regression is to build a linear model, Y=XB+E, where Y is an n cases by m variables response matrix, X is an n cases by p variables predictor (design) matrix, B is a p by m regression coefficient matrix, and E is a noise term for the model which has the same dimensions as Y. Usually, the variables in X and Y are centered by subtracting their means and scaled by dividing by their standard deviations. For more information about centering and scaling in partial least squares regression, you can refer to Geladi and Kowalsky(1986).

Both principal components regression and partial least squares regression produce factor scores as linear combinations of the original predictor variables, so that there is no correlation between the factor score variables used in the predictive regression model. For example, suppose we have a data set with response variables Y (in matrix form) and a large number of predictor variables X (in matrix form), some of which are highly correlated. A regression using factor extraction for this type of data computes the factor score matrix T=XW for an appropriate weight matrix W, and then considers the linear regression model Y=TQ+E, where Q is a matrix of regression coefficients (loadings) for T, and E is an error (noise) term. Once the loadings Q are computed, the above regression model is equivalent to Y=XB+E, where B=WQ, which can be used as a predictive regression model.

Principal components regression and partial least squares regression differ in the methods used in extracting factor scores. In short, principal components regression produces the weight matrix W reflecting the covariance structure between the predictor variables, while partial least squares regression produces the weight matrix W reflecting the covariance structure between the predictor and response variables.

For establishing the model, partial least squares regression produces a p by c weight matrix W for X such that T=XW, i.e., the columns of W are weight vectors for the X columns producing the corresponding n by c factor score matrix T. These weights are computed so that each of them maximizes the covariance between responses and the corresponding factor scores. Ordinary least squares procedures for the regression of Y on T are then performed to produce Q, the loadings for Y (or weights for Y) such that Y=TQ+E. Once Q is computed, we have Y=XB+E, where B=WQ, and the prediction model is complete.

One additional matrix which is necessary for a complete description of partial least squares regression procedures is the p by c factor loading matrix P which gives a factor model X=TP+F, where F is the unexplained part of the X scores. We now can describe the algorithms for computing partial least squares regression.

NIPALS Algorithm
The standard algorithm for computing partial least squares regression components (i.e., factors) is nonlinear iterative partial least squares (NIPALS). There are many variants of the NIPALS algorithm which normalize or do not normalize certain vectors. The following algorithm, which assumes that the X and Y variables have been transformed to have means of zero, is considered to be one of most efficient NIPALS algorithms.

For each h = 1, ..., c, where A0=X'Y, M0=X'X, C0=I, and c given,

  1. compute qh, the dominant eigenvector of Ah'Ah
  2. wh = ChAhqh, wh = wh/||wh||, and store wh into W as a column
  3. ph = Mhwh, ch = wh'Mhwh, ph = ph/ch, and store ph into P as a column
  4. qh = Ah' wh/ch, and store qh into Q as a column
  5. Ah+1 = Ah - chphqh' and Mh+1 = Mh - chphph'

    Ch+1 = Ch - whph'

The factor scores matrix T is then computed as T=XW and the partial least squares regression coefficients B of Y on X are computed as B=WQ'.

SIMPLS Algorithm
An alternative estimation method for partial least squares regression components is the SIMPLS algorithm (de Jong, 1993), which can be described as follows.

For each h = 1,... ,c, where A0=X'Y, M0=X'X, C0=I, and c given,

  1. compute qh, the dominant eigenvector of Ah'Ah
  2. wh = Ahqh, ch = wh'Mhwh, wh = wh/Ö(ch), and store wh into W as a column
  3. ph = Mhwh, and store ph into P as a column
  4. qh = Ah' wh, and store qh into Q as a column
  5. vh = Chph, and vh = vh/||vh||
  6. Ch+1 = Ch - vhvh' and Mh+1 = Mh - phph'

    Ah+1 = ChAh

Similar to NIPALS, the T of SIMPLS is computed as T=XW and B for the regression of Y on X is computed as B=WQ'.