Hat Diagonal Regression Diagnostic

hat

Description

Returns the diagonal of the hat matrix for a least squares regression.

Usage

hat(x, intercept = TRUE)

Arguments

x	matrix of explanatory variables in the regression model y = xb + e, or the QR decomposition of such a matrix. Missing values (NAs) are not accepted.
intercept	logical flag, if TRUE an intercept term is included in the regression model. This is ignored if x is a QR object.

Value

vector with one value for each row of x. These values are the diagonal elements of the least-squares projection matrix H. (Fitted values for a regression of y on x are H %*% y.) Large values of these diagonal elements correspond to points with high leverage.

Background

The diagonals of the hat matrix indicate the amount of leverage (influence) that observations have in a least squares regression. Note that this is independent of the value of y. Observations that have large hat diagonals have more say about the location of the regression line; an observation with a hat diagonal close to 1 will have a residual close to 0 no matter what value the response for that observation takes.

The hat diagonals lie between 1/n and 1 and their average value is p/n where p is the number of variables, i.e., the number of columns of x (plus 1 if intercept = TRUE), and n is the number of observations (the number of rows of x). Belsley, Kuh and Welsch (1980) suggest that points with a hat diagonal greater than 2p/n be considered high leverage points, though they state that too many points will be labeled leverage points by this rule when p is small. Another rule of thumb is to consider any point with a hat diagonal greater than .2 (or .5) as having high leverage. If p is large relative to n, then all points can be "high leverage" points.

By the way, it is called the "hat" matrix because in statistical jargon multiplying the matrix by a vector y puts a "hat" on y, that is, the estimated fit is the result.

References

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. Wiley, New York.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, New York.