Generalized Linear Model (GLM) Overview - Computations for Solving the Multiple Regression Equation

A one dimensional surface in a two dimensional or two-variable space is a line defined by the equation Y=b0+b1X. According to this equation, the Y variable can be expressed in terms of or as a function of a constant (b0) and a slope (b1) times the X variable. The constant is also referred to as the intercept, and the slope as the regression coefficient. For example, GPA may best be predicted as 1+.02*IQ. Thus, knowing that a student has an IQ of 130 would lead us to predict that her GPA would be 3.6 (since, 1+.02*130=3.6). In the multiple regression case, when there are multiple predictor variables, the regression surface usually cannot be visualized in a two dimensional space, but the computations are a straightforward extension of the computations in the single predictor case. For example, if in addition to IQ we had additional predictors of achievement (e.g., Motivation, Self-discipline) we could construct a linear equation containing all those variables. In general then, multiple regression procedures will estimate a linear equation of the form:

Y = b0 + b1X1 + b2X2 + ... + bkXk

where k is the number of predictors. Note that in this equation, the regression coefficients (or b1 ... bk coefficients) represent the independent contributions of each independent variable to the prediction of the dependent variable. Another way to express this fact is to say that, for example, variable X1 is correlated with the Y variable, after controlling for all other independent variable. This type of correlation is also referred to as a partial correlation (this term was first used by Yule, 1907). Perhaps the following example will clarify this issue. One would probably find a significant negative correlation between hair length and height in the population (i.e., short people have longer hair). At first this may seem odd; however, if we were to add the variable Gender into the multiple regression equation, this correlation would probably disappear. This is because women, on the average, have longer hair than men; they also are shorter on the average than men. Thus, after we remove this gender difference by entering Gender into the equation, the relationship between hair length and height disappears because hair length does not make any unique contribution to the prediction of height, above and beyond what it shares in the prediction with variable Gender. Put another way, after controlling for the variable Gender, the partial correlation between hair length and height is zero.

The regression surface (a line in simple regression, a plane or higher-dimensional surface in multiple regression) expresses the best prediction of the dependent variable (Y), given the independent variable (X's). However, nature is rarely (if ever) perfectly predictable, and usually there is substantial variation of the observed points from the fitted regression surface. The deviation of a particular point from the nearest corresponding point on the predicted regression surface (its predicted value) is called the residual value. Since the goal of linear regression procedures is to fit a surface, which is a linear function of the X variables, as closely as possible to the observed Y variable, the residual values for the observed points can be used to devise a criterion for the "best fit." Specifically, in regression problems the surface is computed for which the sum of the squared deviations of the observed points from that surface are minimized. Thus, this general procedure is sometimes also referred to as least squares estimation. (see also the description of weighted least squares estimation).

The actual computations involved in solving regression problems can be expressed compactly and conveniently using matrix notation. Suppose that there are n observed values of Y and n associated observed values for each of k different X variables. Then Yi, Xik, and ei can represent the ith observation of the Y variable, the ith observation of each of the X variables, and the ith unknown residual value, respectively. Collecting these terms into matrices we have

The multiple regression model in matrix notation then can be expressed as

Y = Xb + e

where b is a column vector of 1 (for the intercept) + k unknown regression coefficients. Recall that the goal of multiple regression is to minimize the sum of the squared residuals. Regression coefficients that satisfy this criterion are found by solving the set of normal equations

X'Xb = X' Y

When the X variables are linearly independent (i.e., they are nonredundant, yielding an X'X matrix which is of full rank) there is a unique solution to the normal equations. Premultiplying both sides of the matrix formula for the normal equations by the inverse of X'X gives

( X'X)-1X'Xb = ( X'X)-1X' Y

or

b = (X'X)-1X' Y

This last result is very satisfying in view of its simplicity and its generality. With regard to its simplicity, it expresses the solution for the regression equation in terms just 2 matrices (X and Y) and 3 basic matrix operations, (1) matrix transposition, which involves interchanging the elements in the rows and columns of a matrix, (2) matrix multiplication, which involves finding the sum of the products of the elements for each row and column combination of two conformable (i.e., multipliable) matrices, and (3) matrix inversion, which involves finding the matrix equivalent of a numeric reciprocal, that is, the matrix that satisfies

A-1AA = A

for a matrix A.

It took literally centuries for the ablest mathematicians and statisticians to find a satisfactory method for solving the linear least square regression problem. But their efforts have paid off, for it is hard to imagine a simpler solution.

With regard to the generality of the multiple regression model, its only notable limitations are that (1) it can be used to analyze only a single dependent variable, (2) it cannot provide a solution for the regression coefficients when the X variables are not linearly independent and the inverse of X'X therefore does not exist. These restrictions, however, can be overcome, and in doing so the multiple regression model is transformed into the general linear model.

Other Generalized Linear Model (GLM) Introductory Overview Topics

A detailed discussion of univariate and multivariate ANOVA techniques can also be found in Introductory Overview section of the ANOVA/MANOVA module; a discussion of multiple regression methods is provided in the Multiple Regression Overviews.