In statistics, the g-prior is an objective prior for the regression coefficients of a multiple regression. It was introduced by Arnold Zellner. It is a key tool in Bayes and empirical Bayes variable selection.
Consider a data set                     (                  x                      1                          ,                  y                      1                          )        ,        …        ,        (                  x                      n                          ,                  y                      n                          )                , where the                               x                      i                                   are Euclidean vectors and the                               y                      i                                   are scalars. The multiple regression model is formulated as
                              y                      i                          =                  x                      i                                ⊤                          β        +                  ε                      i                          .                where the                               ε                      i                                   are random errors. Zellner's g-prior for                     β                 is a multivariate normal distribution with covariance matrix proportional to the inverse Fisher information matrix for                     β                .
Assume the                               ε                      i                                   are iid normal with zero mean and variance                               ψ                      −            1                                  . Let                     X                 be the matrix with                     i                th row equal to                               x                      i                                ⊤                                  . Then the g-prior for                     β                 is the multivariate normal distribution with prior mean a hyperparameter                               β                      0                                   and covariance matrix proportional to                     ψ        (                  X                      ⊤                          X                  )                      −            1                                  , i.e.,
                    β                  |                ψ        ∼                  MVN                [                  β                      0                          ,        g        ψ        (                  X                      ⊤                          X                  )                      −            1                          ]        .                where g is a positive scalar parameter.
The posterior distribution of                     β                 is given as
                    β                  |                ψ        ,        x        ,        y        ∼                  MVN                                      [                          q                                            β              ^                                      +        (        1        −        q        )                  β                      0                          ,                              q            ψ                          (                  X                      ⊤                          X                  )                      −            1                                                ]                          .                where                     q        =        g                  /                (        1        +        g        )                 and
                                                        β              ^                                      =        (                  X                      ⊤                          X                  )                      −            1                                    X                      ⊤                          y        .                is the maximum likelihood (least squares) estimator of                     β                . The vector of regression coefficients                     β                 can be estimated by its posterior mean under the g-prior, i.e., as the weighted average of the maximum likelihood estimator and                               β                      0                                  ,
                                                        β              ~                                      =        q                                            β              ^                                      +        (        1        −        q        )                  β                      0                          .                Clearly, as g →∞, the posterior mean converges to the maximum likelihood estimator.
Estimation of g is slightly less straightforward than estimation of                     β                . A variety of methods have been proposed, including Bayes and empirical Bayes estimators.