Multidimensional Scaling Introductory Overview - How Many Dimensions to Specify?

If you are familiar with factor analysis, you will be quite aware of this issue. If you are not familiar with factor analysis, you may want to read the Introductory Overview in Factor Analysis; however, this is not necessary in order to understand the following discussion. In general, the more dimensions we use in order to reproduce the distance matrix, the better is the fit of the reproduced matrix to the observed matrix (i.e., the smaller is the stress). In fact, if we use as many dimensions as there are variables, then we can perfectly reproduce the observed distance matrix. Of course, our goal is to reduce the observed complexity of nature, that is, to explain the distance matrix in terms of fewer underlying dimensions. To return to the example of distances between cities, once we have a two-dimensional map it is much easier to visualize the location of and navigate between cities, as compared to relying on the distance matrix only.

Sources of misfit
Let us consider for a moment why fewer factors may produce a worse representation of a distance matrix than would more factors. Imagine the three cities A, B, and C, and the three cities D, E, and F; shown below are their distances from each other.
  A B C     D E F
A 0     D 0    
B 90 0     E 90 0  
C 90 90 90   F 180 90 0

In the first matrix, all cities are exactly 90 miles apart from each other; in the second matrix, cities D and F are 180 miles apart. Now, can we arrange the three cities (objects) on one dimension (line)? Indeed, we can arrange cities D, E, and F on one dimension:

D---90 miles---E---90 miles---F

D is 90 miles away from E, and E is 90 miles away from F; thus, D is 90+90=180 miles away from F. If you try to do the same thing with cities A, B, and C you will see that there is no way to arrange the three cities on one line so that the distances can be reproduced. However, we can arrange those cities in two dimensions, in the shape of a triangle:

A
90 miles   90 miles
B 90 miles C

Arranging the three cities in this manner, we can perfectly reproduce the distances between them. Without going into much detail, this small example illustrates how a particular distance matrix implies a particular number of dimensions. Of course, "real" data are never this "clean," and contain a lot of noise, that is, random variability that contributes to the differences between the reproduced and observed matrix.

Scree test
A common way to decide how many dimensions to use is to plot the stress value against different numbers of dimensions (scree plot). This test was first proposed by Cattell (1966) in the context of the number-of-factors problem in factor analysis (see Factor Analysis); Kruskal and Wish (1978; pp. 53-60) discuss the application of this plot to MDS.

Cattell suggests to find the place where the smooth decrease of stress values (eigenvalues in factor analysis) appears to level off to the right of the plot. To the right of this point one finds, presumably, only "factorial scree" -- "scree" is the geological term referring to the debris which collects on the lower part of a rocky slope.

For more information on procedures for determining the optimal number of factors to retain, see Reviewing the Results of a Principal Components Analysis in the Introductory Overview for the Factor Analysis module.

Interpretability of configuration
A second criterion for deciding how many dimensions to interpret is the clarity of the final configuration. Sometimes, as in our example of distances between cities, the resultant dimensions are easily interpreted. At other times, the points in the plot form a sort of "random cloud," and there is no straightforward and easy way to interpret the dimensions. In the latter case one should try to include more or fewer dimensions and examine the resultant final configurations. Often, more interpretable solutions emerge. However, if the data points in the plot do not follow any pattern, and if the stress plot does not show any clear "elbow," then the data are most likely random "noise."