Classical Metric Multi-Dimensional Scaling

cmdscale

Description

Represents data in a low dimensional Euclidean space. You can choose the dimension of the space. Also, you can estimate a constant so that dissimilarities can better approximate Euclidean distances.

Usage

cmdscale(d, k = 2, eig = FALSE, add = FALSE, x.ret = FALSE)

Arguments

d	the distance structure of the form returned by dist, or a full, symmetric matrix. Data is assumed to be dissimilarities or relative distances.
k	the desired dimensionality of the output space.
eig	a logical value. If TRUE, returns the eigenvalues computed by the algorithm. They can be used as an aid in determining the appropriate dimensionality of the solution. (The default is FALSE.)
add	a logical value. If TRUE, computes the additive constant (see component ac below). The default is FALSE.
x.ret	a logical value. If TRUE, the doubly-centered distance matrix is included in return value. Here, double centering is subtracting the row and column means of the distance matrix(argument d above) from its elements. (The default is FALSE.)

Details

The cmdscale function is an implementation of metric multidimensional scaling. That is, the distances between points in the result are as close as possible (in a certain sense) to the beginning distances subject to being Euclidean distances in a k dimensional space. The solution for k+1 dimensions has the same first k columns in points (up to numerical error) as the solution for dimension k.

Typically, the additive constant is used when the "distances" in d are subjective dissimilarities. The ac constant attempts to make the distances conform to a Euclidean space with as small of dimension as possible. ac is estimated under the assumption that the Euclidean space has only one dimension: an assumption that simplifies computation. A more technical explanation is that the constant attempts to eliminate negative eigenvalues of the doubly-centered matrix of the squared distances.

There are various measures of the goodness of fit of a solution in the literature. Two of them are given in the function in the example section below. See Mardia, Kent and Bibby (1979, p. 408).

Results are currently computed to single-precision accuracy only.

Value

a matrix like points when eig, add and x.ret are all FALSE.

Otherwise, it returns a list with five components, as follows:

points	a matrix with k columns and as many rows as there were objects whose distances were given in d. Row i gives the coordinates in k-space of the i-th object.
eig	a vector of all eigenvalues; its length is the number of objects whose distances were given in d. Returned only when the eig argument is TRUE; otherwise NULL is returned.
ac	a constant added to all data values in d to transform dissimilarities (or relative distances) into absolute distances. This is returned only if the add argument is TRUE; otherwise, 0 is returned.
x	the doubly-centered distance matrix for classical multidimensional scaling. This is returned only if x.ret argument is TRUE; otherwise, NULL is returned.
GOF	a 2-length numeric vector. The value is: sum(eig[1:k])/c(sum(abs(eig)), sum(pmax(eig, 0)))

Background

Multidimensional scaling is the process of representing, in a small dimensional space, the distances (or dissimilarities) of a group of objects. It is somewhat similar to cluster analysis but returns points in space rather than distinct groupings.

Some examples of its use are anthropologists studying cultural differences based on language, art, and so on; and marketing researchers assessing product similarity.

The technique can be used to "serialize" data if the result is close to a curve in two dimensions or a string in three dimensions. For example, archaeologists might try to place several cultures into a time order.

References

Becker, R. A., Chambers, J. M., and Wilks, A. R. 1988. The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books and Software.

Cailliez, F. 1983. The analytical solution of the additive constant problem. Psychometrika. Volume 48. 343-349.

Cox, T. F. and Cox, M. A. A. 1994. Multidimensional Scaling. London, UK: Chapman and Hall.

Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. Volume 53. 325-328.

Johnson, R. A. and Wichern, D. W. 1982. Applied Multivariate Statistical Analysis. Englewood Cliffs, NJ: Prentice-Hall.

Mardia, K. V., Kent, J. T., and Bibby, J. M. 1979. Multivariate Analysis. London, UK: Academic Press.

Seber, G. A. F. 1984. Multivariate Observations. New York, NY: John Wiley & Sons.

Torgerson, W. S. 1958. Theory and Methods of Scaling. New York, NY: Wiley.