Multiple Regression - Notes and Technical Information

General

The Multiple Regression routine consists of two major parts. The first calculates a correlation matrix (or extracts a correlation matrix if matrix input is selected) according to the user's specifications (such as, missing data, selection conditions, etc.). The second performs the actual multiple regression analyses.

Calculating Multiple Regression, Matrix Inversion

All calculations involved in the actual multiple regression analysis are performed in double precision. Matrix inversion is accomplished via sweeping (see Dempster, 1969, p. 62). The regression weights, residual sums of squares, tolerances, and partial correlations are also calculated as part of the sweeping operation (see also Jennrich, 1977).

Statistical Significance Tests

The standard formulas are used for calculating the F-value associated with the multiple R, and for the t-values associated with the regression coefficients (e.g., see Cooley & Lohnes, 1971; Darlington, 1990; Lindeman, Merenda, and Gold, 1980; Morrison, 1967; Neter, Wasserman, & Kutner, 1985; Pedhazur, 1973; Stevens, 1986; Younger, 1985).

Residuals

The standard error of a residual score is computed as the square root of:

[1 - 1/n - (Xraw - Xmean )*C-1*(Xraw - Xmean )']*RMS

where

Xraw is the vector of raw data for the independent variables
Xmean is the vector of means for the independent variables
C-1 is the inverse of the matrix of crossproducts of deviations for the independent variables
n is the number of valid cases
RMS is the residual mean square

The terms 1/n and Xmean are dropped if there is no intercept (regression forced through the origin).

The standardized residuals are obtained by dividing each residual by the square root of the residual mean square.

The Mahalanobis distance is the distance of a case from the centroid of all cases in the space defined by the independent variables. It is computed as:

(n-1)*(Xraw - Xmean )*C-1*(Xraw - Xmean )'

where

Xraw is the vector of raw data for the independent variables
Xmean is the vector of means for the independent variables
C-1 is the inverse of the matrix of crossproducts of deviations for the independent variables
n is the number of valid cases

The terms 1/n and Xmean are dropped if there is no intercept (regression forced through the origin). Refer to the Multiple Regression Examples for an example of how Mahalanobis distances can aid in the detection of outliers.

The deleted residual is the residual which would have been obtained had the case not been included in the estimation of the regression equation. It is calculated by dividing the ordinary residual by:

1 - (1/n) - (Xraw - Xmean )*C-1*(Xraw - Xmean )'

where

Xraw is the vector of raw data for the independent variables
Xmean is the vector of means for the independent variables
C-1 is the inverse of the matrix of crossproducts of deviations for the independent variables
n is the number of valid cases

The terms 1/n and Xmean are dropped if there is no intercept (regression forced through the origin). Refer to the Multiple Regression Examples for an example of how deleted residuals can aid in the detection of outliers.

Cook's distance (Cook, 1977) is useful for assessing the changes that would result in all residuals if the respective case were to be omitted from the regression analysis. It is defined as:

{Deleted res.2 *[1/n + MD/(n-1)]}/[(No. of vars + 1)*RMS]

where:

MD is the Mahalanobis distance
RMS is the residual mean square

If there is no intercept, n-1 is replaced by n, the term 1/n is dropped, and the term +1 (adding 1 to the number of independent variables) is dropped.

Power Transformations for Dependent and Independent Variables

Statistical significance testing in multiple regression is based on the assumption of homogenous residual variance over the range of the dependent variable. When this assumption is violated, in some cases an appropriate transformation of the dependent or independent variables may correct the problem. A class of power transformations that can be applied to the dependent variable or the independent variables is:

y' = yλ for λ≠0
  = natural log(y) for λ=0

This formulation encompasses the reciprocal transformation (λ=-1), the square root transformation (λ=.5), the square transformation (λ=2), and the logarithmic transformation (λ=0). However, note that it is required that all values of y be greater than 0 (zero). For additional details about these transformations, refer to Box and Cox (1964), Box and Tidwell (1962), Gunst, Mason, and Hess (1989), or Snee (1986).