Leverage (statistics) - Alchetron, The Free Social Encyclopedia

In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations.

Definition

In the linear regression model, the leverage score for the i-th data unit is defined as:

h i i = [ H ] i i ,

the i-th diagonal element of the projection matrix H = X ( X T X ) − 1 X T , where X is the design matrix. The leverage score is also known as the observation self-sensitivity or self-influence, as shown by

h i i = ∂ y ^ i ∂ y i ,

where y ^ i and y i are the fitted and measured observation, respectively.

Bounds on leverage

0 ≤ h i i ≤ 1.

Proof

First, note that H is an idempotent matrix: H 2 = X ( X ⊤ X ) − 1 X ⊤ X ( X ⊤ X ) − 1 X ⊤ = X I ( X ⊤ X ) − 1 X ⊤ = H . Also, observe that H is symmetric. So equating the ii element of H to that of H ², we have

h i i = h i i 2 + ∑ j ≠ i h i j 2 ≥ 0

and

h i i ≥ h i i 2 ⟹ h i i ≤ 1.

Effect on residual variance

If we are in an ordinary least squares setting with fixed X, regression errors ϵ i , and

Y = X β + ϵ Var ⁡ ( ϵ ) = σ 2 I

then Var ⁡ ( e i ) = ( 1 − h i i ) σ 2 where e i = Y i − Y ^ i (the i th regression residual).

In other words, if the model errors ϵ are homoscedastic, an observation's leverage score determines the degree of noise in the model's misprediction of that observation.

Proof

First, note that I − H is idempotent and symmetric. This gives,

Var ⁡ ( e ) = Var ⁡ ( ( I − H ) Y ) = ( I − H ) Var ⁡ ( Y ) ( I − H ) ⊤ = σ 2 ( I − H ) 2 = σ 2 ( I − H ) .

Thus Var ⁡ ( e i ) = ( 1 − h i i ) σ 2 .

Studentized residuals

The corresponding studentized residual—the residual adjusted for its observation–specific residual variance—is then

t i = e i σ ^ 1 − h i i

where σ ^ is an appropriate estimate of σ .

References

Leverage (statistics) Wikipedia

(Text) CC BY-SA

Contents

Definition

Bounds on leverage

Proof

Effect on residual variance

Proof

Studentized residuals

References