cancor
Canonical Correlation Analysis

Description

Finds linear relationships between two groups of multivariate data. By default the data is centered using means.

Usage

cancor (x, y, xcenter = TRUE, ycenter = TRUE)

Arguments

x,y two matrices of data. The number of rows (which represent observations) must be the same in each. Missing values (NAs) are not accepted.
xcenter controls the centering applied to the columns of x before computing the canonical analysis. If TRUE or if the argument is missing, column means are removed. If FALSE, no centering is done. If the argument is numeric, the numeric values are removed from the corresponding columns.
ycenter controls the centering of the columns of y analogously to xcenter for the columns of x.
Value
list representing the canonical correlation analysis:
cor vector of the correlations between the pairs of variables.
xcoef the matrix of linear combinations of the columns of x. The first column of xcoef is the linear combination of columns of x corresponding to the first canonical correlation, etc. The row names are the selected column names of x, that is: colnames(x)[qr(x)$pivot][1:qr(x)$rank]
ycoef matrix like xcoef, but originating from y, i.e., The first column of ycoef is the linear combination of columns of y corresponding to the first canonical correlation, etc.. The row names are the selected column names of y, that is: colnames(y)[qr(y)$pivot][1:qr(y)$rank]
xcenter vector of values subtracted from the columns of x.
ycenter vector of values subtracted from the columns of y.
Background
Canonical correlation seeks a linear combination of one set of variables and a linear combination of a second set of variables such that the correlation is maximized. It is similar to regression, which seeks a linear combination of a set of variables that maximizes the correlation with a single (response) variable.
The second and higher canonical correlations find linear combinations that maximize the correlation subject to being uncorrelated with previous canonical variables. The number of canonical correlations is the minimum of the number of variables in the two sets.
References
Many multivariate statistics books have discussions of canonical correlation. Examples include:
Becker, R. A., Chambers, J. M., and Wilks, A. R. 1988. The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books and Software.
Dillon, W. R. and Goldstein, M. 1984. Multivariate Analysis, Methods and Applications. New York, NY: Wiley.
Hotelling H. 1936. Relations between two sets of variables. Biometrika. Volume 28. 321-327.
Mardia, K. V., Kent, J. T., and Bibby, J. M. 1979. Multivariate Analysis. London, UK: Academic Press.
Seber, G. A. F. 1984. Multivariate Observations. New York, NY: John Wiley & Sons. 506f.
See Also
lm, princomp, svd, qr.
Examples
# canonical decomposition with column means swept out
x <- Sdatasets::longley[, 1:3]
y <- Sdatasets::longley[, 4:5]
cancor(x, y)

# canonical decomposition with column medians of x subtracted out, # y as is: cancor(x, y, apply(x, 2, median), FALSE)

soil <- Sdatasets::evap.x[,1:3] air <- Sdatasets::evap.x[,-1:-3] cc.airsoil <- cancor(air, soil) can.air <- air %*% cc.airsoil$xcoef can.soil <- soil %*% cc.airsoil$ycoef # plot(can.air[,1], can.soil[,1], xlab="first air canonical variable", # ylab="first soil canonical variable") # par(mfrow=c(2, 1)) # barplot(cc.airsoil$xcoef[,1], ylab="first air loadings", # names=dimnames(air)[[2]], density=20) # barplot(cc.airsoil$ycoef[,1], ylab="first soil loadings", # names=dimnames(soil)[[2]], density=20, space=1.4)

Package stats version 6.0.0-69
Package Index