prcomp
Principal Components Analysis

Description

Finds a new coordinate system for multivariate data such that the first coordinate has maximal variance, the second coordinate has maximal variance subject to being orthogonal to the first, and so on. Note: This function is deprecated; use princomp instead.

Usage

prcomp(x, ...)
prcomp.formula(formula, data = NULL, subset, na.action, ...)
prcomp.default(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, ...)

Arguments

x, formula a matrix, data frame, or formula. If you specify a matrix, the columns should correspond to variables and the rows to observations. If you specify a formula, do not place any variables on the left (response) side.
data a data frame or matrix. This argument is usually used only when you provide a formula, although it may be used instead of x.
subset the subset of the observations to use.
na.action a character string that specifies how to handle missing values (NAs). By default, an error is returned if missing values (NAs) are present.
retx a logical value. If TRUE, the function returns a rotated version of the data matrix. Specifying retx = FALSE saves space in the returned data structure.
center a logical value or vector that enables control over the value subtracted from each column.
  • If TRUE (the default), the mean of each column, excluding any missing data, is subtracted from the column.
  • If a vector, the length of the vector must equal the number of columns in x calculated by ncol(x)). In this case, center[j] is subtracted from column j.
  • If FALSE, centering is not performed.
scale. a logical value or vector that specifies if the value divided into each column to scale it.
  • If TRUE, each column (after centering) is divided by the square root of sum-of-squares (after centering) over n - 1, where n is the number of non-missing values.
  • If a vector, the length of the vector must equal the number of columns in x calculated by ncol(x)). In this case, column j is divided by scale[j].
  • If FALSE, scaling is not performed.
tol a value that specifies if certain elements should be dropped. If this value is not NULL, the principal components whose standard deviation is less than the largest standard deviation mutiplied by tol are dropped, and the rotation are also dropped.

Details

The analysis is performed even if there are less rows than columns in the input (nrow(x) < ncol(x)), but in this case the number of variables that are derived is equal to nrow(x), and therefore the returned x will only contain nrow(x) columns. In general, if any of the derived variables has zero standard deviation, that variable is dropped from the returned result.
The estimates are made through the singular value decomposition of the input x. The standard deviations are the singular values divided by one less than the number of observations.
If ret <- prcomp(dat), then ret\$x == dat %*% ret\$rotation up to numerical precision.
Value
returns a list object of class prcomp with components:
sdev a vector of standard deviations of the derived variables.
rotation an orthogonal matrix that describes the rotation. The first column is the linear combination of columns of x that define the first principal component, and so on. This may have fewer columns than x. This is commonly called the loadings; it is not a rotation in the sense often used in factor analysis.
center center used for centering.
scale scale used for scaling.
x rotated version of x. That is, the first column is the nrow(x) values for the first derived variable, and so on. This may have fewer columns than x. Returned only when retx = TRUE.
terms terms object of the formula. Not present if a formula was not used.
call an image of the call to prcomp. This is not present if a formula was not used.
Background
Principal component analysis defines a rotation of the variables (columns) of x. The first derived direction is chosen to maximize the standard deviation of the derived variable, the second to maximize the standard deviation among directions uncorrelated with the first, and so on.
Principal component analysis is often used as a data reduction technique, sometimes in conjunction with regression. We recommend that you scale the columns of the input before performing the principal component analysis since a variable with large variance relative to the others will dominate the first principal component.
References
Many multivariate statistics books (and some regression texts) include a discussion of principal components. Below are a few examples:
Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis, Methods and Applications. Wiley, New York.
Johnson, R. A. and Wichern, D. W. (1982). Applied Multivariate Statistical Analysis. Prentice-Hall, Englewood Cliffs, New Jersey.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
See Also
svd, lsfit, cancor, princomp.
Examples
data(cars)
prcomp(cars)
prcomp(cars, scale = TRUE)
Package stats version 6.0.0-69
Package Index