Hat Diagonal Regression Diagnostic

hat

Description

Returns the diagonal of the hat matrix for a least squares regression.

Usage

hat(x, intercept = TRUE)

Arguments

x	matrix of explanatory variables in the regression model y = xb + e, or the QR decomposition of such a matrix. Missing values (NAs) are not accepted.
intercept	logical flag, if TRUE an intercept term is included in the regression model. This is ignored if x is a QR object.

Value

vector with one value for each row of x. These values are the diagonal elements of the least-squares projection matrix H. (Fitted values for a regression of y on x are H %*% y.) Large values of these diagonal elements correspond to points with high leverage.

Background

The diagonals of the hat matrix indicate the amount of leverage (influence) that observations have in a least squares regression. Note that this is independent of the value of y. Observations that have large hat diagonals have more say about the location of the regression line; an observation with a hat diagonal close to 1 will have a residual close to 0 no matter what value the response for that observation takes.

The hat diagonals lie between 1/n and 1 and their average value is p/n where p is the number of variables, i.e., the number of columns of x (plus 1 if intercept = TRUE), and n is the number of observations (the number of rows of x). Belsley, Kuh and Welsch (1980) suggest that points with a hat diagonal greater than 2p/n be considered high leverage points, though they state that too many points will be labeled leverage points by this rule when p is small. Another rule of thumb is to consider any point with a hat diagonal greater than .2 (or .5) as having high leverage. If p is large relative to n, then all points can be "high leverage" points.

By the way, it is called the "hat" matrix because in statistical jargon multiplying the matrix by a vector y puts a "hat" on y, that is, the estimated fit is the result.

References

Belsley, D. A., Kuh, E., and Welsch, R. E. 1980. Regression Diagnostics. New York, NY: John Wiley & Sons.

Cook, R. D. and Weisberg, S. 1982. Residuals and Influence in Regression. New York, NY: Chapman and Hall.