Model Identification

For practical purposes it is usually not enough to have a particular model which, when expressed in the framework of Equation 41, reproduces S. For a model to be of much conceptual or practical value, its parameters must be identified. That is, there must exist only one parameter vector q for which
 S = S(q).

Perhaps the simplest example of a covariance structure model which is not identified is a common factor analysis model with two manifest variables and one common factor. In this case (assuming the common factor has a variance of 1) the covariance structure model becomes

S = ff¢ + U2(75)

In this case the parameter vector q has 4 elements, the two elements in f and the two diagonal elements of U2.

Suppose

(76)

If

(77)

S = ff¢ + U2, and the model fits perfectly. In this case

(78)

But there are other values of q which will reproduce S equally well. In fact there are infinitely many such values.

For example, let

(79)

If U2 is restricted to be positive definite, clearly any two values for the first two elements of q which have a product of .5, and are both less than one in absolute value will produce a discrepancy function value of zero. The diagonal elements of U2 are then obtained by subtracting the square of the corresponding element of f from 1.0.

Note: this is not a problem of the well known "rotational indeterminacy" in factor analysis. (With only one factor, there is no rotation.) Rather it is an example of a lesser known phenomenon, namely, that the elements of U2 may not be identified in the common factor model. If U2 is not identified, then there may exist common factor patterns which reproduce S equally well, but which are not obtainable from each other by rotation.

Even in the relatively comfortable confines of the common factor model, the phenomena of model identification are not well understood. Some of the most significant textbooks on factor analysis have failed to ever mention the problem. Moreover, several authoritative figures in the history of psychometrics have produced "results" on model identification in factor analysis which they have later had to retract or correct.

In general, necessary and sufficient conditions for identification are not available. However, it is often possible to determine that a model is not identified by showing that a necessary condition is violated.

There are some results available on when U2 in the factor model is definitely not identified. One of the best-known was given by Anderson and Rubin (1956). They showed that if, in unrestricted factor analysis, under any orthogonal or oblique rotation, there existed a factor pattern with only 2 non-zero elements in any column, then U2 is not identified. Clearly then, if such a situation exists (see Everitt, 1984, pages 45-49 for an example), additional constraints will have to be imposed to yield an identified solution.

The Anderson-Rubin result has an important implication which is often overlooked in discussions of the identification issue. Namely, it may not be possible to prove identification in the population without knowing S! In other words the same model may be identified for one S, but not for another. One cannot prove identification merely by counting equations and unknowns.

For some (relatively simple) models, it may be possible to prove identification by deriving unique equations, showing each parameter as a function of the elements of S. Unfortunately this approach is often impractical, and so checking for identification usually involves two stages.

First, very obvious sources of lack of identification should be removed. The most obvious source of underidentification in path models occurs when the measurement scale of an exogenous latent variable is left indeterminate. Consider the oblique common factor model, which can be written

S = FWF¢ + U2(80)

The variances of the common factors are found on the diagonal of W. The factor loading coefficient for manifest variable i on factor j is found in element Fij. It is easy to show that unless restrictions are imposed on this model, the variance for factor j and the loadings on this factor are jointly indeterminate. To see why, suppose you were to multiply all the factor variances by 2. If you were to multiply all the columns of F by .7071, you would have exactly the same S. More generally, if we were to scale the diagonal of W with a diagonal scaling matrix D, we could compensate by scaling the columns of F with D-1. In other words, for positive definite D,

(81)

so that for any F and W there are infinitely many F and W which reproduce S equally well.

There are several ways of eliminating the lack of identification problem in practice. One way is to fix the variances of the exogenous latent variables at 1. (This fix may not be sufficient in all cases.) Another approach is to apply some constraint to the factor loading coefficients themselves. This approach is popular in structural models where the main interest is in the relations between latent variables. In this case, identification is often obtained by fixing one of the coefficients on a particular variable to 1.

When unstandardized latent variable models are fit, usually the variance of endogenous latent variables will not be identified. In such situations, the traditional "fix" has been to set one of the coefficients from the latent variable to 1. However, when the "Standardization New" option is used, this is not necessary, as SEPATH imposes internal constraints on the estimation process which result in all endogenous latent variables having unit variance.

Once obvious sources of non-identification have been eliminated, it is productive to examine whether either of the following easily tested conditions is violated.

  1. The number of degrees of freedom for the model must be nonnegative. That is p(p + 1)/2 ³ t, where p is the order of S, and t is the number of free parameters in the model.
  2. The Hessian (the matrix of second derivatives of the discrepancy function with respect to the parameters) must be positive definite.

Violation of either of these conditions usually indicates an identification problem (for exceptions, see Shapiro & Browne, 1983), and SEPATH warns the user if they are violated.