In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; it is proportional to the square root of the determinant of the Fisher information:
Contents
- One parameter case
- Multiple parameter case
- Attributes
- Minimum description length
- Examples
- Gaussian distribution with mean parameter
- Gaussian distribution with standard deviation parameter
- Poisson distribution with rate parameter
- Bernoulli trial
- N sided die with biased probabilities
- References
It has the key feature that it is invariant under reparameterization of the parameter vector
One-parameter case
For an alternative parameterization
from
using the change of variables theorem for transformations and the definition of Fisher information:
Multiple-parameter case
For an alternative parameterization
from
using the change of variables theorem for transformations, the definition of Fisher information, and that the product of determinants is the determinant of the matrix product:
Attributes
From a practical and mathematical standpoint, a valid reason to use this non-informative prior instead of others, like the ones obtained through a limit in conjugate families of distributions, is that it is not dependent upon the set of parameter variables that is chosen to describe parameter space.
Sometimes the Jeffreys prior cannot be normalized, and is thus an improper prior. For example, the Jeffreys prior for the distribution mean is uniform over the entire real line in the case of a Gaussian distribution of known variance.
Use of the Jeffreys prior violates the strong version of the likelihood principle, which is accepted by many, but by no means all, statisticians. When using the Jeffreys prior, inferences about
Minimum description length
In the minimum description length approach to statistics the goal is to describe data as compactly as possible where the length of a description is measured in bits of the code used. For a parametric family of distributions one compares a code with the best code based on one of the distributions in the parameterized family. The main result is that in exponential families, asymptotically for large sample size, the code based on the distribution that is a mixture of the elements in the exponential family with the Jeffreys prior is optimal. This result holds if one restricts the parameter set to a compact subset in the interior of the full parameter space. If the full parameter is used a modified version of the result should be used.
Examples
The Jeffreys prior for a parameter (or a set of parameters) depends upon the statistical model.
Gaussian distribution with mean parameter
For the Gaussian distribution of the real value
with
That is, the Jeffreys prior for
Gaussian distribution with standard deviation parameter
For the Gaussian distribution of the real value
with
Equivalently, the Jeffreys prior for
Poisson distribution with rate parameter
For the Poisson distribution of the non-negative integer
the Jeffreys prior for the rate parameter
Equivalently, the Jeffreys prior for
Bernoulli trial
For a coin that is "heads" with probability
This is the arcsine distribution and is a beta distribution with
N-sided die with biased probabilities
Similarly, for a throw of an