Mean and predicted response - Alchetron, the free social encyclopedia

In linear regression mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

Background

In straight line fitting, the model is

y i = α + β x i + ϵ i

where y i is the response variable, x i is the explanatory variable, ε_i is the random error, and α and β are parameters. The predicted response value for a given explanatory value, x_d, is given by

y ^ d = α ^ + β ^ x d ,

while the actual response would be

y d = α + β x d + ϵ d

Expressions for the values and variances of α ^ and β ^ are given in linear regression.

Mean response

Since the data in this context is defined to be (x,y) pairs for every observation, the Mean response at a given value of x, say x_d, is an estimate of the mean of the y values in the population at the x value of x_d, that is E ^ ( y | x d ) ≡ y ^ d . The variance of the mean response is given by

Var ( α ^ + β ^ x d ) = Var ( α ^ ) + ( Var β ^ ) x d 2 + 2 x d Cov ( α ^ , β ^ ) .

This expression can be simplified to

Var ( α ^ + β ^ x d ) = σ 2 ( 1 m + ( x d − x ¯ ) 2 ∑ ( x i − x ¯ ) 2 ) .

To demonstrate this simplification, one can make use of the identity

∑ ( x i − x ¯ ) 2 = ∑ x i 2 − 1 m ( ∑ x i ) 2 .

Predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point x_d. So the variance is given by

Var ( y d − [ α ^ + β ^ x d ] ) = Var ( y d ) + Var ( α ^ + β ^ x d ) .

The second part of this expression was already calculated for the mean response. Since Var ( y d ) = σ 2 (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

Var ( y d − [ α ^ + β ^ x d ] ) = σ 2 + σ 2 ( 1 m + ( x d − x ¯ ) 2 ∑ ( x i − x ¯ ) 2 ) = σ 2 ( 1 + 1 m + ( x d − x ¯ ) 2 ∑ ( x i − x ¯ ) 2 ) .

Confidence intervals

The 100 ( 1 − α ) % confidence intervals are computed as y d ± t α 2 , m − n − 1 Var . Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of y values does not shrink when one samples from it, because the random variable ε_i does not decrease, but the variance of the mean of the y does shrink with increased sampling, because the variance in α ^ and β ^ decrease, so the mean response (predicted response value) becomes closer to α + β x d .

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

General linear regression

The general linear model can be written as

y i = ∑ j = 1 n X i j β j + ϵ i

Therefore, since y d = ∑ j = 1 n X d j β ^ j the general expression for the variance of the mean response is

Var ⁡ ( ∑ j = 1 n X d j β ^ j ) = ∑ i = 1 n ∑ j = 1 n X d i S i j X d j ,

where S is the covariance matrix of the parameters, given by

S = σ 2 ( X T X ) − 1 .

References

Mean and predicted response Wikipedia

(Text) CC BY-SA