Analyzing Correlation Matrices

SEPATH implements a procedure, pioneered by Mels (1989), for correctly analyzing correlation matrices. Traditional models and procedures for analysis of covariance structures are based on the assumption that the sample covariance matrix is being analyzed. This assumption is often inconvenient in practice. In many situations, the sample covariance matrix is ill-scaled. In addition, variables standardized to the same scale (i.e., unit variance) are generally easier to interpret. Moreover, in some situations involving reanalysis of older studies, only the correlation matrix is available.

The above considerations have led many researchers to input sample correlation matrices to covariance structure analysis programs as though they were covariance matrices. Cudeck (1989) points out that this often can lead to incorrect results. In particular, unless the model is invariant under diagonal rescaling, the calculated standard errors will almost certainly be incorrect, and the observed test statistic may also be incorrect.

SEPATH implements, via the Correlations option under Data to Analyze in the Analysis Parameters dialog, a completely transparent system for correctly analyzing the sample correlation matrix. If this command is given, SEPATH computes and analyzes the sample correlation matrix, regardless of whether the input file is a covariance matrix, correlation matrix, or rectangular data file. Thus, for models that SEPATH can analyze, the problems detailed by Cudeck (1989) are completely eliminated.

SEPATH solves the problem of analyzing correlations by utilizing an augmented version of the SEPATH model, and following the general analytical strategy advocated by Cudeck (1989) and Mels (1989). This strategy begins by converting the structural model into one which is scale free. This conversion is done as follows.

  1. Start with the path diagram as it would be if covariances were analyzed.
  2. Replace each manifest variable with a "dummy" latent variable, with a single arrow (having a free "scaling" parameter) pointing to the original manifest variable. If the original manifest variable is exogenous, its dummy latent variable will be exogenous. If the original manifest variable is endogenous, its dummy latent variable will be endogenous.

The resulting model must be invariant under changes in scale, because any effect of change in the scale of the observed variables can be absorbed by the scaling parameters. Moreover, the path coefficients can be estimated with all the dummy latent variables constrained to have unit variances. Simply set the variance of the exogenous dummy latent variables to a fixed value of 1, and use the new constrained estimation procedure to fix the variances of the endogenous dummy latent variables to 1.

In the revised model,

(139)

and

(140)

where and are the dummy latent variables. As mentioned above, each dummy latent variable is connected, via a single arrow, to a manifest variable. Hence, the model equations are the same as before, except for the additional relationships, which may be written

(141)

and

(142)

Letting

(143)

the revised equation for S can be written, in the notation of Equation 41 as

(144)

where D5 is a matrix of scaling factors, and P is the covariance matrix of the dummy latent variables. These variables, as mentioned above, may be constrained to have unit variance. Since they are related to the manifest variables by scaling constants, they may be thought of as the standardized unit variance equivalents of the manifest variables, so the matrix P is, in fact, a correlation structure model for the manifest variables.

Since the above model is invariant under rescaling of S, the identical discrepancy function will be produced, regardless of how the sample covariance matrix is scaled. Any changes in scaling will simply be absorbed in the nuisance scaling parameters in D5. Consequently, when the Analyze Correlations option is in effect, the sample covariance matrix is automatically rescaled into a correlation matrix, which is then analyzed. This rescaling tends, in practice, to improve performance of the iterative algorithms, especially in cases where S, the sample covariance matrix, has variances that differ by orders of magnitude.