In statistics, the g-prior is an objective prior for the regression coefficients of a multiple regression. It was introduced by Arnold Zellner. It is a key tool in Bayes and empirical Bayes variable selection.
Consider a data set ( x 1 , y 1 ) , … , ( x n , y n ) , where the x i are Euclidean vectors and the y i are scalars. The multiple regression model is formulated as
y i = x i ⊤ β + ε i . where the ε i are random errors. Zellner's g-prior for β is a multivariate normal distribution with covariance matrix proportional to the inverse Fisher information matrix for β .
Assume the ε i are iid normal with zero mean and variance ψ − 1 . Let X be the matrix with i th row equal to x i ⊤ . Then the g-prior for β is the multivariate normal distribution with prior mean a hyperparameter β 0 and covariance matrix proportional to ψ ( X ⊤ X ) − 1 , i.e.,
β | ψ ∼ MVN [ β 0 , g ψ ( X ⊤ X ) − 1 ] . where g is a positive scalar parameter.
The posterior distribution of β is given as
β | ψ , x , y ∼ MVN [ q β ^ + ( 1 − q ) β 0 , q ψ ( X ⊤ X ) − 1 ] . where q = g / ( 1 + g ) and
β ^ = ( X ⊤ X ) − 1 X ⊤ y . is the maximum likelihood (least squares) estimator of β . The vector of regression coefficients β can be estimated by its posterior mean under the g-prior, i.e., as the weighted average of the maximum likelihood estimator and β 0 ,
β ~ = q β ^ + ( 1 − q ) β 0 . Clearly, as g →∞, the posterior mean converges to the maximum likelihood estimator.
Estimation of g is slightly less straightforward than estimation of β . A variety of methods have been proposed, including Bayes and empirical Bayes estimators.