Principal Components & Classification Analysis Example
This example illustrates how the Principal Components & Classification Analysis module can be used to create a factor space for a set of variables, how to interpret the dimensions, and how to map additional variables and observations into the factor space. The example is based on a data set discussed in Jambu (1991) that contains data for various lifestyle variables (activities) for 28 (groups of) observations.
Specifically, the data describes the numbers of hours spent in each of the 10 activities by 28 different groups of observations. Some of the data are missing, and these data would be substituted by their respective means. Three additional variables, SLEEP, TV, and LEISURE, are considered as supplementary variables. For demonstration purposes (to illustrate how to specify active and supplementary cases) the data were modified to include an additional variable GENDER (for defining active cases), and a variable GEO.REGION for labeling of observations in plots, and so on. Note that because of these additions and modifications to the data, the results from the analyses in this example would not be identical to those reported in Jambu (1991).
Specifying the Analysis
Open the data file
Activities.sta, and open the Principal Components & Classification Analysis module through the Statistics - Multivariate Exploratory Techniques submenu. Click the
Advanced

Also choose FEMALE as the code for active cases in the Code for active cases box. After you have completed the selection of variables, it is important to specify whether the analysis would be based on correlations or covariances; we would base this analysis on the correlation matrix, so select the Correlations option button. Also, in the MD deletion group box, select the Mean substitution option button to substitute the missing values by their respective means.

Reviewing the Results
Click OK to perform the initial computations. Then, on the Results dialog set the Number of factors to 2. As a result, the Quality of representation would be computed as 81%.

Let us next review the main results for this analysis: The Summary box at the top of the Results dialog provides useful summary information about the current analysis, such as the number of active and supplementary variables and cases and the eigenvalues. Other results for variables are available on the Variables tab of Results dialog.


The factor corresponding to the largest eigenvalue (3.976814) accounts for approximately 56.8% of the total variance. The second factor corresponding to the second eigenvalue (1.690162) accounts for approximately 24.14% of the total variance, and so on. When analyzing correlation matrices, the sum of the eigenvalues is equal to the number of (active) variables from which the factors were extracted (computed), and the "average expected" eigenvalue is equal to 1.0. Many criteria are used in practice for selecting the appropriate number of factors for interpretation (see also the Factor Analysis documentation); the simplest is to use (retain for interpretation) as many factors as the number of eigenvalues that are greater than 1. In this example, only the first two eigenvalues are greater than 1, accounting for approximately 81% of total variation.

Cattell suggests to find the place where the smooth decrease of eigenvalues look to level off to the right of the plot. To the right of this point, presumably, one finds only "factorial scree." Scree is the geological term referring to the debris that collects on the lower part of a rocky slope. Thus, no more than the number of factors to the left of this point should be extracted.

In the current analysis the first axis, corresponding to the eigenvalue 3.976814, is most correlated with the variables WORK and TRANSPORT (high negative correlations), and HOUSEHOLD and CHILDREN (high positive correlations). Based on the magnitudes of the factor coordinates (variable-factor correlations) for the variables in the analysis and the supplementary variables, and the signs of those correlations perhaps one could label the first dimension as Work versus Home related activities (note the high negative coefficients for WORK, TRANSPORT, and PERSONAL CARE versus the positive values for HOUSEHOLD, CHILDREN, and so on) while the second factor may be related to "work-like" (recurrent) activities required by modern organized life (SHOPPING, PERSONAL CARE); however, you may prefer to choose different labels (and the inclusion of additional supplementary variables or cases in future research could clarify the interpretation of the second factor).

Cases
tab to display the results for the observations (cases). Specifically, on the Cases tab, select the No names/numbers option button in the Options for plot of factor coordinates group box, and then click the Plot case factor coordinates 2D button.
This plot shows the factor coordinates for all observations, that is, both the active observations (cases) that were used to compute the current factor solution (namely, Females) as well as the supplementary observations (cases) that are only mapped into the coordinate system defined by the two factors (Males). One interesting result that is apparent in this plot pertains to the clustering of active and supplementary cases. It appears that all supplementary cases (Males, plotted as red squares) in the analysis are plotted to the left of the center of the first axis (that is, have negative coordinate values for the first, horizontal axis). Given the interpretation of this factor as Work versus Home-related activities, with WORK and TRANSPORT defining the negative (left) side of this dimension, it appears that the daily activities of Males in this study fall mostly on the Work side of this dimension.
Summary
The purpose of this example is to illustrate how the Principal Components and Classification Analysis module can be used as a tool for first identifying important dimensions in a set of variables, then to map into those dimensions other variables of interest, and to identify clusters of observations with similar characteristics with respect to these dimensions.