In econometrics, when endogeneity is a concern in a dynamic panel data framework, it is possible to exploit the panel data structure of the data to deal with this issue. Examples include (but are not limited to) data over time on capital investment or wage equations. The first differencing approach to instrumental variables, also referred to as first-difference two-stage least squares (FD2SLS), was first proposed by Anderson and Hsiao (1982) and later extended by Arellano and Bond (1991). The key problem the attempt address, is the problem of as predetermined regressors.
Predetermined variables will lead static fixed effects estimators, such as First Differencing or Within-Group Fixed Effects models to be inconsistent. The strict exogeneity assumption fails in dynamic models. Let eit denote the idiosyncratic part of the error terms (i.e. it does not include the fixed effect ai). Consider the following Fixed Effects model.
yit = Xit b + ai + eit (1)When dealing with dynamic models, the exogeneity assumption has to be weakened as follows:
E[ eit xis ] = 0 for all t > s (2)Static fixed effects models typically transform the data to obtain consistent estimates of the coefficient vector b. Within-Group Fixed Effect models transform equation (1) to equation (3) below, where ĉit = cit − ∑cit and a standard OLS-type regression can then estimate the coefficient vector b consistently under strict exogeneity.
(ŷit ) = xitb + êit ) (3)Weak exogeneity, however, is clearly violated when averaging over all time periods, and the estimator will not be consistent. First Differencing will similary provide inconsistent estimates.
Anderson and Hsiao were the first to propose an estimation technique for dynamic panel data models by exploiting the lagged structure for choosing appropriate instrumental variables within a GMM framework. More specifically, they propose First Differencing Δyit = Δxit b + Δeit, after which variables xi,t−1, xi,t−2, ... all provide valid instruments zi,t. They pass both the relevance and exclusion condition. One can then apply a two-step procedure for instrumental variables.
Arellano and Bond noted that this FD2SLS estimator is not efficient. The procedure does not take into account the possible correlation across t of the differenced errors Δeit, and does not use all the available information from all time periods. For these reasons, they propose a two-stage efficient Mixture model#Gaussian mixture modelGMM estimation procedure to the moment conditions
E[(Δyit − Δxit b) zit ] = 0 for t = 2,...,T and where zi,t = xi,t−1, xi,t−2 ,...