hat
Hat Diagonal Regression Diagnostic
Description
Returns the diagonal of the hat matrix for a least squares regression.
Usage
hat(x, intercept = TRUE)
Arguments
x |
matrix of explanatory variables in the regression model
y = xb + e,
or the QR decomposition of such a matrix.
Missing values (NAs) are not accepted.
|
intercept |
logical flag, if TRUE an intercept term is
included in the regression model.
This is ignored if x is a QR object.
|
Value
vector with one value for each row of x. These values are
the diagonal elements of the least-squares projection matrix
H. (Fitted values for a regression of y on x are H %*% y.)
Large values of these diagonal elements
correspond to points with high leverage.
Background
The diagonals of the hat matrix indicate the amount of leverage (influence)
that observations have in a least squares regression.
Note that this is independent of the value of y.
Observations that have large hat diagonals have more say about the location
of the regression line; an observation with a hat diagonal close to 1 will
have a residual close to 0 no matter what value the response for that
observation takes.
The hat diagonals lie between 1/n and 1 and their average
value is p/n where p is the number of
variables, i.e., the number of columns of x (plus 1 if intercept = TRUE),
and n is the number of observations (the number of rows of x).
Belsley, Kuh and Welsch (1980) suggest that points with a hat diagonal greater
than 2p/n be considered high leverage points, though they state that
too many points will be labeled leverage points by this rule
when p is small.
Another rule of thumb is to consider any point with a hat diagonal greater
than .2 (or .5) as having high leverage.
If p is large relative to n, then all points can be "high leverage" points.
By the way, it is called the "hat" matrix because in statistical jargon
multiplying the matrix by a vector y puts a "hat" on y, that is, the
estimated fit is the result.
References
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980).
Regression Diagnostics.
Wiley, New York.
Cook, R. D. and Weisberg, S. (1982).
Residuals and Influence in Regression.
Chapman and Hall, New York.
See Also
Examples
h <- hat(Sdatasets::freeny.x)
plot(h, xlab="index number", ylab="hat diagonal")
abline(h=2*ncol(Sdatasets::freeny.x)/nrow(Sdatasets::freeny.x))