In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations.
Contents
High-leverage points are those observations, if any, made at extreme or outlying values of the independent variables such that the lack of neighboring observations means that the fitted regression model will pass close to that particular observation.
Modern computer packages for statistical analysis include, as part of their facilities for regression analysis, various quantitative measures for identifying influential observations: among these measures is partial leverage, a measure of how a variable contributes to the leverage of a datum.
Definition
In the linear regression model, the leverage score for the i-th data unit is defined as:
the i-th diagonal element of the projection matrix
where
Bounds on leverage
Proof
First, note that H is an idempotent matrix:
and
Effect on residual variance
If we are in an ordinary least squares setting with fixed X, regression errors
then
In other words, if the model errors
Proof
First, note that
Thus
Studentized residuals
The corresponding studentized residual—the residual adjusted for its observation–specific residual variance—is then
where