Sums of Squares - Unbalanced and Balanced Designs
If you compute the correlation matrix for all variables for the example data shown in the Multiple Regression and ANOVA topic, you will notice that the main effects (A and B) and interaction (AxB) are all uncorrelated. This property of the effects is usually also referred to as orthogonality, that is, effects A and B are said to be orthogonal or independent of each other. Because all effects in the design shown above are orthogonal to each other, the design is said to be balanced.
Balanced designs have some "nice" properties; specifically, the computations to analyze such designs are quite simple. For example, all we would have to do is compute the simple correlations between the effects and the dependent variable. Since the effects are orthogonal, partial correlations (i.e., the full multiple regression) does not have to be computed. However, in real research, designs are not always balanced.
Consider the following data set with unequal numbers of observations in different cells.
If you again coded these data as before, and compute the correlation matrix for all variables, you would find that the factors in the design are now correlated with each other. Thus, the factors in the design are no longer orthogonal, and the design is said to be unbalanced. Note that the correlation among the factors is entirely due to the different frequencies of the 1's and -1's in the effect columns of the data matrix. Put another way, experimental designs with unequal cell sizes (or non-proportional cell sizes to be exact) will be unbalanced, that is, the main effects and interactions will be confounded. Thus, in order to evaluate the statistical significance of effects, we have to compute the complete multiple regression. In fact there are several strategies that you can follow.