Estimation of Variance Components - Estimating Components of Variation

For ANOVA methods for estimating variance components, a solution is found for the system of equations relating the estimated population variances and covariances among the random factors to the estimated population covariances between the random factors and the dependent variable. The solution then defines the variance components. The spreadsheet below shows the Type Sums of squares estimates of the variance components for the Wheat.sta data.

Components of Variance (wheat.sta)
  Mean Squares Type: 1
Source DAMAGE
{1}VARIETY 0.067186
{2}PLOT 0.056435
Error 0.000000

MIVQUE(0) variance components are estimated by inverting the partition of the SSQ matrix that does not include the dependent variable (or finding the generalized inverse, for singular matrices), and postmultiplying the inverse by the dependent variable column vector. This amounts to solving the system of equations that relates the dependent variable to the random independent variables, taking into account the covariation among the independent variables. The MIVQUE(0) estimates for the Wheat.sta data are listed in the spreadsheet shown below.

MIVQUE(0) Variance Component Estimation (wheat.sta)
  Variance Components
Source DAMAGE
{1}VARIETY 0.056376
{2}PLOT 0.065028
Error 0.000000

REML and ML variance components are estimated by iteratively optimizing the parameter estimates for the effects in the model. REML differs from ML in that the likelihood of the data is maximized only for the random effects, thus REML is a restricted solution. In both REML and ML estimation, an iterative solution is found for the weights for the random effects in the model that maximize the likelihood of the data. The program uses MIVQUE(0) estimates as the start values for both REML and ML estimation, so the relation between these three techniques is close indeed.

The statistical theory underlying maximum likelihood variance component estimation techniques is an advanced topic (Searle, Casella, & McCulloch, 1992, is recommended as an authoritative and comprehensive source). Implementation of maximum likelihood estimation algorithms, furthermore, is difficult (see, for example, Hemmerle & Hartley, 1973, and Jennrich & Sampson, 1976, for descriptions of these algorithms), and faulty implementation can lead to variance component estimates that lie outside the parameter space, converge prematurely to nonoptimal solutions, or give nonsensical results. Milliken and Johnson (1992) noted all of these problems with the commercial software packages they used to estimate variance components. In the Variance Components and Mixed Model ANOVA/ANCOVA module, care has been taken to avoid these problems as much as possible. Note, for example, that for the analysis reported in Example 2: Variance Component Estimation for a Four-Way Mixed Factorial Design, most statistical packages do not give reasonable results.

The basic idea behind both REML and ML estimation is to find the set of weights for the random effects in the model that minimize the negative of the natural logarithm times the likelihood of the data (the likelihood of the data can vary from zero to one, so minimizing the negative of the natural logarithm times the likelihood of the data amounts to maximizing the probability, or the likelihood, of the data). The logarithm of the REML likelihood and the REML variance component estimates for the Wheat.sta data are listed in the last row of the Iteration history spreadsheet shown below.

Iteration History (wheat.sta)
  Variable: DAMAGE
Iter. Log LL Error VARIETY
1 -2.30618 .057430 .068746
2 -2.25253 .057795 .073744
3 -2.25130 .056977 .072244
4 -2.25088 .057005 .073138
5 -2.25081 .057006 .073160
6 -2.25081 .057003 .073155
7 -2.25081 .057003 .073155

The logarithm of the ML likelihood and the ML estimates for the variance components for the Wheat.sta data are listed in the last row of the Iteration history spreadsheet shown below.

Iteration History (wheat.sta)
  Variable: DAMAGE
Iter. Log LL Error VARIETY
1 -2.53585 .057454 .048799
2 -2.48382 .057427 .048541
3 -2.48381 .057492 .048639
4 -2.48381 .057491 .048552
5 -2.48381 .057492 .048552
6 -2.48381 .057492 .048552

As can be seen, the estimates of the variance components for the different methods are quite similar. In general, components of variance using different estimation methods tend to agree fairly well (see, for example, Swallow & Monahan, 1984).