hat
Hat Diagonal Regression Diagnostic
Description
Returns the diagonal of the hat matrix for a least squares regression.
Usage
hat(x, intercept = TRUE)
Arguments
x |
matrix of explanatory variables in the regression model
y = xb + e,
or the QR decomposition of such a matrix.
Missing values (NAs) are not accepted.
|
intercept |
logical flag, if TRUE an intercept term is
included in the regression model.
This is ignored if x is a QR object.
|
Value
vector with one value for each row of x. These values are
the diagonal elements of the least-squares projection matrix
H. (Fitted values for a regression of y on x are H %*% y.)
Large values of these diagonal elements
correspond to points with high leverage.
Background
The diagonals of the hat matrix indicate the amount of leverage (influence)
that observations have in a least squares regression.
Note that this is independent of the value of y.
Observations that have large hat diagonals have more say about the location
of the regression line; an observation with a hat diagonal close to 1 will
have a residual close to 0 no matter what value the response for that
observation takes.
The hat diagonals lie between 1/n and 1 and their average
value is p/n where p is the number of
variables, i.e., the number of columns of x (plus 1 if intercept = TRUE),
and n is the number of observations (the number of rows of x).
Belsley, Kuh and Welsch (1980) suggest that points with a hat diagonal greater
than 2p/n be considered high leverage points, though they state that
too many points will be labeled leverage points by this rule
when p is small.
Another rule of thumb is to consider any point with a hat diagonal greater
than .2 (or .5) as having high leverage.
If p is large relative to n, then all points can be "high leverage" points.
By the way, it is called the "hat" matrix because in statistical jargon
multiplying the matrix by a vector y puts a "hat" on y, that is, the
estimated fit is the result.
References
Belsley, D. A., Kuh, E., and Welsch, R. E. 1980. Regression Diagnostics. New York, NY: John Wiley & Sons.
Cook, R. D. and Weisberg, S. 1982. Residuals and Influence in Regression. New York, NY: Chapman and Hall.
See Also
Examples
h <- hat(Sdatasets::freeny.x)
plot(h, xlab="index number", ylab="hat diagonal")
abline(h=2*ncol(Sdatasets::freeny.x)/nrow(Sdatasets::freeny.x))