Canonical Correlation Analysis

cancor

Description

Finds linear relationships between two groups of multivariate data. By default the data is centered using means.

Usage

cancor (x, y, xcenter = TRUE, ycenter = TRUE)

Arguments

x,y	two matrices of data. The number of rows (which represent observations) must be the same in each. Missing values (NAs) are not accepted.
xcenter	controls the centering applied to the columns of x before computing the canonical analysis. If TRUE or if the argument is missing, column means are removed. If FALSE, no centering is done. If the argument is numeric, the numeric values are removed from the corresponding columns.
ycenter	controls the centering of the columns of y analogously to xcenter for the columns of x.

Value

list representing the canonical correlation analysis:

cor	vector of the correlations between the pairs of variables.
xcoef	the matrix of linear combinations of the columns of x. The first column of xcoef is the linear combination of columns of x corresponding to the first canonical correlation, etc. The row names are the selected column names of x, that is: colnames(x)[qr(x)$pivot][1:qr(x)$rank]
ycoef	matrix like xcoef, but originating from y, i.e., The first column of ycoef is the linear combination of columns of y corresponding to the first canonical correlation, etc.. The row names are the selected column names of y, that is: colnames(y)[qr(y)$pivot][1:qr(y)$rank]
xcenter	vector of values subtracted from the columns of x.
ycenter	vector of values subtracted from the columns of y.

Background

Canonical correlation seeks a linear combination of one set of variables and a linear combination of a second set of variables such that the correlation is maximized. It is similar to regression, which seeks a linear combination of a set of variables that maximizes the correlation with a single (response) variable.

The second and higher canonical correlations find linear combinations that maximize the correlation subject to being uncorrelated with previous canonical variables. The number of canonical correlations is the minimum of the number of variables in the two sets.

References

Many multivariate statistics books have discussions of canonical correlation. Examples include:

Becker, R. A., Chambers, J. M., and Wilks, A. R. 1988. The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books and Software.

Dillon, W. R. and Goldstein, M. 1984. Multivariate Analysis, Methods and Applications. New York, NY: Wiley.

Hotelling H. 1936. Relations between two sets of variables. Biometrika. Volume 28. 321-327.

Mardia, K. V., Kent, J. T., and Bibby, J. M. 1979. Multivariate Analysis. London, UK: Academic Press.

Seber, G. A. F. 1984. Multivariate Observations. New York, NY: John Wiley & Sons. 506f.

See Also

lm, princomp, svd, qr.

Examples

# canonical decomposition with column means swept out
x <- Sdatasets::longley[, 1:3]
y <- Sdatasets::longley[, 4:5]
cancor(x, y)
# canonical decomposition with column medians of x subtracted out,
# y as is:
cancor(x, y, apply(x, 2, median), FALSE)
soil <- Sdatasets::evap.x[,1:3]
air <- Sdatasets::evap.x[,-1:-3]
cc.airsoil <- cancor(air, soil)
can.air <- air %*% cc.airsoil$xcoef
can.soil <- soil %*% cc.airsoil$ycoef
# plot(can.air[,1], can.soil[,1], xlab="first air canonical variable",
#      ylab="first soil canonical variable")
# par(mfrow=c(2, 1))
# barplot(cc.airsoil$xcoef[,1], ylab="first air loadings",
#      names=dimnames(air)[[2]], density=20)
# barplot(cc.airsoil$ycoef[,1], ylab="first soil loadings",
#      names=dimnames(soil)[[2]], density=20, space=1.4)

Package stats version 6.1.9-33
Package Index