Multiple Regression - Notes and Technical Information
General
The Multiple Regression routine consists of two major parts. The first calculates a correlation matrix (or extracts a correlation matrix if matrix input is selected) according to the user's specifications (such as, missing data, selection conditions, etc.). The second performs the actual multiple regression analyses.
Calculating Multiple Regression, Matrix Inversion
All calculations involved in the actual multiple regression analysis are performed in double precision. Matrix inversion is accomplished via sweeping (see Dempster, 1969, p. 62). The regression weights, residual sums of squares, tolerances, and partial correlations are also calculated as part of the sweeping operation (see also Jennrich, 1977).
Statistical Significance Tests
The standard formulas are used for calculating the F-value associated with the multiple R, and for the t-values associated with the regression coefficients (e.g., see Cooley & Lohnes, 1971; Darlington, 1990; Lindeman, Merenda, and Gold, 1980; Morrison, 1967; Neter, Wasserman, & Kutner, 1985; Pedhazur, 1973; Stevens, 1986; Younger, 1985).
Residuals
The standard error of a residual score is computed as the square root of:
[1 - 1/n - (Xraw - Xmean )*C-1*(Xraw - Xmean )']*RMS
where
Xraw | is the vector of raw data for the independent variables |
Xmean | is the vector of means for the independent variables |
C-1 | is the inverse of the matrix of crossproducts of deviations for the independent variables |
n | is the number of valid cases |
RMS | is the residual mean square |
The terms 1/n and Xmean are dropped if there is no intercept (regression forced through the origin).
The standardized residuals are obtained by dividing each residual by the square root of the residual mean square.
The Mahalanobis distance is the distance of a case from the centroid of all cases in the space defined by the independent variables. It is computed as:
(n-1)*(Xraw - Xmean )*C-1*(Xraw - Xmean )'
where
Xraw | is the vector of raw data for the independent variables |
Xmean | is the vector of means for the independent variables |
C-1 | is the inverse of the matrix of crossproducts of deviations for the independent variables |
n | is the number of valid cases |
The terms 1/n and Xmean are dropped if there is no intercept (regression forced through the origin). Refer to the Multiple Regression Examples for an example of how Mahalanobis distances can aid in the detection of outliers.
The deleted residual is the residual which would have been obtained had the case not been included in the estimation of the regression equation. It is calculated by dividing the ordinary residual by:
1 - (1/n) - (Xraw - Xmean )*C-1*(Xraw - Xmean )'
where
Xraw | is the vector of raw data for the independent variables |
Xmean | is the vector of means for the independent variables |
C-1 | is the inverse of the matrix of crossproducts of deviations for the independent variables |
n | is the number of valid cases |
The terms 1/n and Xmean are dropped if there is no intercept (regression forced through the origin). Refer to the Multiple Regression Examples for an example of how deleted residuals can aid in the detection of outliers.
Cook's distance (Cook, 1977) is useful for assessing the changes that would result in all residuals if the respective case were to be omitted from the regression analysis. It is defined as:
{Deleted res.2 *[1/n + MD/(n-1)]}/[(No. of vars + 1)*RMS]
where:
If there is no intercept, n-1 is replaced by n, the term 1/n is dropped, and the term +1 (adding 1 to the number of independent variables) is dropped.
Power Transformations for Dependent and Independent Variables
Statistical significance testing in multiple regression is based on the assumption of homogenous residual variance over the range of the dependent variable. When this assumption is violated, in some cases an appropriate transformation of the dependent or independent variables may correct the problem. A class of power transformations that can be applied to the dependent variable or the independent variables is:
This formulation encompasses the reciprocal transformation (λ=-1), the square root transformation (λ=.5), the square transformation (λ=2), and the logarithmic transformation (λ=0). However, note that it is required that all values of y be greater than 0 (zero). For additional details about these transformations, refer to Box and Cox (1964), Box and Tidwell (1962), Gunst, Mason, and Hess (1989), or Snee (1986).