Partial Least Squares (PLS) Overview - Computational Approach
Both principal components regression and partial least squares regression produce factor scores as linear combinations of the original predictor variables, so that there is no correlation between the factor score variables used in the predictive regression model. For example, suppose we have a data set with response variables Y (in matrix form) and a large number of predictor variables X (in matrix form), some of which are highly correlated. A regression using factor extraction for this type of data computes the factor score matrix T=XW for an appropriate weight matrix W, and then considers the linear regression model Y=TQ+E, where Q is a matrix of regression coefficients (loadings) for T, and E is an error (noise) term. Once the loadings Q are computed, the above regression model is equivalent to Y=XB+E, where B=WQ, which can be used as a predictive regression model.
Principal components regression and partial least squares regression differ in the methods used in extracting factor scores. In short, principal components regression produces the weight matrix W reflecting the covariance structure between the predictor variables, while partial least squares regression produces the weight matrix W reflecting the covariance structure between the predictor and response variables.
For establishing the model, partial least squares regression produces a p by c weight matrix W for X such that T=XW, i.e., the columns of W are weight vectors for the X columns producing the corresponding n by c factor score matrix T. These weights are computed so that each of them maximizes the covariance between responses and the corresponding factor scores. Ordinary least squares procedures for the regression of Y on T are then performed to produce Q, the loadings for Y (or weights for Y) such that Y=TQ+E. Once Q is computed, we have Y=XB+E, where B=WQ, and the prediction model is complete.
One additional matrix which is necessary for a complete description of partial least squares regression procedures is the p by c factor loading matrix P which gives a factor model X=TP+F, where F is the unexplained part of the X scores. We now can describe the algorithms for computing partial least squares regression.
For each h = 1, ..., c, where A0=X'Y, M0=X'X, C0=I, and c given,
- compute qh, the dominant eigenvector of Ah'Ah
- wh = ChAhqh, wh = wh/||wh||, and store wh into W as a column
- ph = Mhwh, ch = wh'Mhwh, ph = ph/ch, and store ph into P as a column
- qh = Ah' wh/ch, and store qh into Q as a column
- Ah+1 = Ah -
chphqh' and Mh+1 = Mh -
chphph'
Ch+1 = Ch - whph'
The factor scores matrix T is then computed as T=XW and the partial least squares regression coefficients B of Y on X are computed as B=WQ'.
For each h = 1,... ,c, where A0=X'Y, M0=X'X, C0=I, and c given,
- compute qh, the dominant eigenvector of Ah'Ah
- wh = Ahqh, ch = wh'Mhwh, wh = wh/Ö(ch), and store wh into W as a column
- ph = Mhwh, and store ph into P as a column
- qh = Ah' wh, and store qh into Q as a column
- vh = Chph, and vh = vh/||vh||
- Ch+1 = Ch -
vhvh' and Mh+1 = Mh -
phph'
Ah+1 = ChAh
Similar to NIPALS, the T of SIMPLS is computed as T=XW and B for the regression of Y on X is computed as B=WQ'.