In statistics, the g-prior is an objective prior for the regression coefficients of a multiple regression. It was introduced by Arnold Zellner. It is a key tool in Bayes and empirical Bayes variable selection.
Consider a data set
(
x
1
,
y
1
)
,
…
,
(
x
n
,
y
n
)
, where the
x
i
are Euclidean vectors and the
y
i
are scalars. The multiple regression model is formulated as
y
i
=
x
i
⊤
β
+
ε
i
.
where the
ε
i
are random errors. Zellner's g-prior for
β
is a multivariate normal distribution with covariance matrix proportional to the inverse Fisher information matrix for
β
.
Assume the
ε
i
are iid normal with zero mean and variance
ψ
−
1
. Let
X
be the matrix with
i
th row equal to
x
i
⊤
. Then the g-prior for
β
is the multivariate normal distribution with prior mean a hyperparameter
β
0
and covariance matrix proportional to
ψ
(
X
⊤
X
)
−
1
, i.e.,
β
|
ψ
∼
MVN
[
β
0
,
g
ψ
(
X
⊤
X
)
−
1
]
.
where g is a positive scalar parameter.
The posterior distribution of
β
is given as
β
|
ψ
,
x
,
y
∼
MVN
[
q
β
^
+
(
1
−
q
)
β
0
,
q
ψ
(
X
⊤
X
)
−
1
]
.
where
q
=
g
/
(
1
+
g
)
and
β
^
=
(
X
⊤
X
)
−
1
X
⊤
y
.
is the maximum likelihood (least squares) estimator of
β
. The vector of regression coefficients
β
can be estimated by its posterior mean under the g-prior, i.e., as the weighted average of the maximum likelihood estimator and
β
0
,
β
~
=
q
β
^
+
(
1
−
q
)
β
0
.
Clearly, as g →∞, the posterior mean converges to the maximum likelihood estimator.
Estimation of g is slightly less straightforward than estimation of
β
. A variety of methods have been proposed, including Bayes and empirical Bayes estimators.