In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unknown correlation between outcomes.
Contents
Parameter estimates from the GEE are consistent even when the covariance structure is misspecified, under mild regularity conditions. The focus of the GEE is on estimating the average response over the population ("population-averaged" effects) rather than the regression parameters that would enable prediction of the effect of changing one or more covariates on a given individual. GEEs are usually used in conjunction with Huber–White standard error estimates, also known as "robust standard error" or "sandwich variance" estimates. In the case of a linear model with a working independence variance structure, these are known as "heteroscedasticity consistent standard error" estimators. Indeed, the GEE unified several independent formulations of these standard error estimators in a general framework.
GEEs belong to a class of regression techniques that are referred to as semiparametric because they rely on specification of only the first two moments. Under correct model specification and mild regularity conditions, parameter estimates from GEEs are consistent. They are a popular alternative to the likelihood–based generalized linear mixed model which is more sensitive to variance structure specification. They are commonly used in large epidemiological studies, especially multi-site cohort studies, because they can handle many types of unmeasured dependence between outcomes.
Formulation
Given a mean model
The parameters
Computation
Software for solving generalized estimating equations is available in MATLAB, SAS (proc gee), SPSS (the gee procedure), Stata (the xtgee command) and R (packages gee, geepack and multgee).
Comparisons among software packages for the analysis of binary correlated data and ordinal correlated data via GEE are available.