cmdscale
Classical Metric Multi-Dimensional Scaling
Description
Represents data in a low dimensional Euclidean space.
You can choose the dimension of the space. Also, you can estimate a constant
so that dissimilarities can better approximate Euclidean distances.
Usage
cmdscale(d, k = 2, eig = FALSE, add = FALSE, x.ret = FALSE)
Arguments
d |
the distance structure of the form returned by dist, or a full, symmetric
matrix. Data is assumed to be dissimilarities or relative distances.
|
k |
the desired dimensionality of the output space.
|
eig |
a logical value. If TRUE, returns the eigenvalues computed by
the algorithm. They can be used as an aid in determining the appropriate
dimensionality of the solution. (The default is FALSE.)
|
add |
a logical value. If TRUE, computes the additive constant
(see component ac below). The default is FALSE.
|
x.ret |
a logical value. If TRUE, the doubly-centered distance matrix is
included in return value. Here, double centering is subtracting the row
and column means of the distance matrix(argument d above) from
its elements. (The default is FALSE.)
|
Details
The cmdscale function is an implementation of metric multidimensional
scaling. That is, the distances between points in the result are
as close as possible (in a certain sense) to the beginning distances subject
to being Euclidean distances in a k dimensional space.
The solution for k+1 dimensions has the same first k columns in points
(up to numerical error) as the solution for dimension k.
Typically, the additive constant is used when the "distances" in d are
subjective dissimilarities. The ac constant attempts to make the distances
conform to a Euclidean space with as small of dimension as possible.
ac is estimated under the assumption that the Euclidean space
has only one dimension: an assumption that simplifies computation.
A more technical explanation is that the constant attempts to eliminate
negative eigenvalues of the doubly-centered matrix of the squared distances.
There are various measures of the goodness of fit of a solution
in the literature. Two of them are given in the function in the
example section below. See Mardia, Kent and Bibby (1979, p. 408).
Results are currently computed to single-precision accuracy only.
Value
a matrix like points when eig, add
and x.ret are all FALSE.
Otherwise, it returns a list with five components, as follows:
points |
a matrix with k columns and as many rows as there were
objects whose distances were given in d. Row i gives the
coordinates in k-space of the i-th object.
|
eig |
a vector of all eigenvalues; its length is the number of objects
whose distances were given in d.
Returned only when the eig argument is TRUE;
otherwise NULL is returned.
|
ac |
a constant added to all data values in d to transform dissimilarities
(or relative distances) into absolute distances.
This is returned only if the add argument is TRUE;
otherwise, 0 is returned.
|
x |
the doubly-centered distance matrix for classical multidimensional scaling.
This is returned only if x.ret argument is TRUE;
otherwise, NULL is returned.
|
GOF |
a 2-length numeric vector.
The value is: sum(eig[1:k])/c(sum(abs(eig)), sum(pmax(eig, 0)))
|
Background
Multidimensional scaling is the process of representing,
in a small dimensional space, the distances
(or dissimilarities) of a group of objects.
It is somewhat similar to cluster analysis but returns points in space
rather than distinct groupings.
Some examples of its use are anthropologists studying cultural differences
based on language, art, and so on; and marketing researchers assessing product
similarity.
The technique can be used to "serialize" data if the result is close to a
curve in two dimensions or a string in three dimensions. For example, archaeologists
might try to place several cultures into a time order.
References
Becker, R. A., Chambers, J. M., and Wilks, A. R. 1988. The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books and Software.
Cailliez, F. 1983. The analytical solution of the additive constant problem. Psychometrika. Volume 48. 343-349.
Cox, T. F. and Cox, M. A. A. 1994. Multidimensional Scaling. London, UK: Chapman and Hall.
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. Volume 53. 325-328.
Johnson, R. A. and Wichern, D. W. 1982. Applied Multivariate Statistical Analysis. Englewood Cliffs, NJ: Prentice-Hall.
Mardia, K. V., Kent, J. T., and Bibby, J. M. 1979. Multivariate Analysis. London, UK: Academic Press.
Seber, G. A. F. 1984. Multivariate Observations. New York, NY: John Wiley & Sons.
Torgerson, W. S. 1958. Theory and Methods of Scaling. New York, NY: Wiley.
See Also
Examples
dis.vote <- dist(Sdatasets::votes.repub)
mdsx <- cmdscale(dis.vote) # default 2-space
# other argument usages
cmdscale(dis.vote, eig = TRUE)
cmdscale(dis.vote, add = TRUE, x.ret = TRUE)