prcomp
Principal Components Analysis
Description
Finds a new coordinate system for multivariate data such that the first
coordinate has maximal variance, the second coordinate has maximal
variance subject to being orthogonal to the first, and so on.
Note: This function is deprecated; use princomp instead.
Usage
prcomp(x, ...)
prcomp.formula(formula, data = NULL, subset, na.action, ...)
prcomp.default(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, ...)
Arguments
x, formula |
a matrix, data frame, or formula. If you specify a matrix, the columns
should correspond to variables and the rows to observations. If you
specify a formula, do not place any variables on the left (response)
side.
|
data |
a data frame or matrix. This argument is usually used only when you provide a formula, although it may be used instead of x.
|
subset |
the subset of the observations to use.
|
na.action |
a character string that specifies how to handle missing values (NAs). By default, an error is returned if missing values (NAs) are present.
|
retx |
a logical value. If TRUE, the function returns a rotated version of the data matrix. Specifying retx = FALSE saves space
in the returned data structure.
|
center |
a logical value or vector that enables control over the value
subtracted from each column.
- If TRUE (the default), the mean of
each column, excluding any missing data, is subtracted from the
column.
- If a vector, the length of the vector must equal the number of
columns in x calculated by ncol(x)). In this case,
center[j] is subtracted from column j.
- If FALSE, centering is not performed.
|
scale. |
a logical value or vector that specifies if the value divided into
each column to scale it.
- If TRUE, each column (after centering) is divided by the square root of sum-of-squares (after centering) over n - 1, where n is the number of non-missing values.
- If a vector, the length of the vector must equal the number of
columns in x calculated by ncol(x)). In this case, column j is divided by scale[j].
- If FALSE, scaling is not performed.
|
tol |
a value that specifies if certain elements should be dropped. If this value is not NULL, the principal components whose standard deviation is less than the largest standard deviation mutiplied by tol are dropped, and the rotation are also dropped.
|
Details
The analysis is performed even if there are less rows than columns in
the input (nrow(x) < ncol(x)), but in this case the number of
variables that are derived is equal to nrow(x), and therefore
the returned x will only contain nrow(x) columns. In
general, if any of the derived variables has zero standard deviation,
that variable is dropped from the returned result.
The estimates are made through the singular value decomposition of the
input x. The standard deviations are the singular values
divided by one less than the number of observations.
If ret <- prcomp(dat), then ret\$x == dat %*% ret\$rotation up to numerical precision.
Value
returns a list object of class prcomp with components:
sdev |
a vector of standard deviations of the derived variables.
|
rotation |
an orthogonal matrix that describes the rotation. The first
column is the linear combination of columns of x that define
the first principal component, and so on. This may have fewer columns
than x. This is commonly called the loadings; it is not
a rotation in the sense often used in factor analysis.
|
center |
center used for centering.
|
scale |
scale used for scaling.
|
x |
rotated version of x. That is, the first column is the nrow(x) values for the first derived variable, and so on. This may have fewer columns than x. Returned only when retx = TRUE.
|
terms |
terms object of the formula. Not present if a formula was not used.
|
call |
an image of the call to prcomp.
This is not present if a formula was not used.
|
Background
Principal component analysis defines a rotation of the variables (columns) of x. The first derived direction is chosen to maximize the standard deviation of the derived variable, the second to maximize the standard deviation among directions uncorrelated with the first, and so on.
Principal component analysis is often used as a data reduction technique, sometimes in conjunction with regression. We recommend that you scale the columns of the input before performing the principal component analysis since a variable with large variance relative to the others will dominate the first principal component.
References
Many multivariate statistics books (and some regression texts) include a
discussion of principal components. Below are a few examples:
Dillon, W. R. and Goldstein, M. (1984).
Multivariate Analysis, Methods and Applications.
Wiley, New York.
Johnson, R. A. and Wichern, D. W. (1982).
Applied Multivariate Statistical Analysis.
Prentice-Hall, Englewood Cliffs, New Jersey.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979).
Multivariate Analysis.
Academic Press, London.
See Also
Examples
data(cars)
prcomp(cars)
prcomp(cars, scale = TRUE)