In econometrics, an endogeneity problem occurs when an explanatory variable is correlated with the error term. Endogeneity can arise as a result of measurement error, autoregression with autocorrelated errors, simultaneous causality (see Instrumental variable) and omitted variables. Two common causes of endogeneity are: 1) an uncontrolled confounder causing both independent and dependent variables of a model; and 2) a loop of causality between the independent and dependent variables of a model.
Contents
- Exogeneity versus endogeneity
- Static models
- Omitted variable
- Measurement error
- Simultaneity
- Dynamic models
- References
For example, in a simple supply and demand model, when predicting the quantity demanded in equilibrium, the price is endogenous because producers change their price in response to demand and consumers change their demand in response to price. In this case, the price variable is said to have total endogeneity once the demand and supply curves are known. In contrast, a change in consumer tastes or preferences would be an exogenous change on the demand curve.
Exogeneity versus endogeneity
In a stochastic model, the notion of the usual exogeneity, sequential exogeneity, strong/strict exogeneity can be defined. Exogeneity is articulated in such a way that a variable or variables is exogenous for parameter
When the explanatory variables are not stochastic, then they are strong exogenous for all the parameters.
If the independent variable is correlated with the error term in a regression model then the estimate of the regression coefficient in an ordinary least squares (OLS) regression is biased; however if the correlation is not contemporaneous, then the coefficient estimate may still be consistent. There are many methods of correcting the bias, including instrumental variable regression and Heckman selection correction.
Static models
The following are some common sources of endogeneity.
Omitted variable
In this case, the endogeneity comes from an uncontrolled confounding variable. A variable is correlated with both an independent variable in the model, and with the error term. (Equivalently, the omitted variable both affects the independent variable and separately affects the dependent variable.) Assume that the "true" model to be estimated is,
but we omit
If the correlation of
Here, x and 1 are not exogenous for α and β, since, given x and 1, the distribution of y depends not only on α and β, but also on z and gamma.
Measurement error
Suppose that we do not get a perfect measure of one of our independent variables. Imagine that instead of observing
is written in terms of observables and error terms as
Since both
Simultaneity
Suppose that two variables are codetermined, with each affecting the other. Suppose that there are two "structural" equations,
Estimating either equation by itself results in endogeneity. In the case of the first structural equation,
Assuming that
Therefore, attempts at estimating either structural equation will be hampered by endogeneity.
Dynamic models
The endogeneity problem is particularly relevant in the context of time series analysis of causal processes. It is common for some factors within a causal system to be dependent for their value in period t on the values of other factors in the causal system in period t − 1. Suppose that the level of pest infestation is independent of all other factors within a given period, but is influenced by the level of rainfall and fertilizer in the preceding period. In this instance it would be correct to say that infestation is exogenous within the period, but endogenous over time.
Let the model be y = f(x, z) + u. If the variable x is sequential exogenous for parameter
Simultaneity
Generally speaking, simultaneity occurs in the dynamic model just like in the example of static simultaneity above.