Assumptions and Effects of Violating Assumptions - Homogeneity of Variances

Assumptions

It is assumed that the variances in the different groups of the design are identical; this assumption is called the homogeneity of variances assumption. Remember that at the beginning of this section we computed the error variance (SS error) by adding up the sums of squares within each group. If the variances in the two groups are different from each other, then adding the two together is not appropriate, and will not yield an estimate of the common within-group variance (since no common variance exists). ANOVA/MANOVA contains a wide variety of statistical tests to detect violations of this assumption.

Effects of violations

Lindman (1974, p. 33) shows that the F statistic is quite robust against violations of this assumption (heterogeneity of variances; see also Box, 1954a, 1954b; Hsu, 1938).

Special case: correlated means and variances

However, one instance when the F statistic is very misleading is when the means are correlated with variances across cells of the design. ANOVA/MANOVA allows you to plot a scatterplot of variances or standard deviations against the means to detect such correlations. The reason why this is a "dangerous" violation is the following: Imagine that you have 8 cells in the design, 7 with about equal means but one with a much higher mean. The F statistic may suggest to you a statistically significant effect. However, suppose that there also is a much larger variance in the cell with the highest mean, that is, the means and the variances are correlated across cells (the higher the mean the larger the variance). In that case, the high mean in the one cell is actually quite unreliable, as is indicated by the large variance. However, because the overall F statistic is based on a pooled within-cell variance estimate, the high mean is identified as significantly different from the others, when in fact it is not at all significantly different if one based the test on the within-cell variance in that cell alone.

This pattern (a high mean and a large variance in one cell) frequently occurs when there are outliers present in the data. One or two extreme cases in a cell with only 10 cases can greatly bias the mean, and will dramatically increase the variance.