The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing constant in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated probability measure, the Gibbs measure, has the Markov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the Hopfield network), and applications such as genomics, corpus linguistics and artificial intelligence, which employ Markov networks, and Markov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing the entropy for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom.
Contents
- Definition
- The parameter
- Symmetry
- As a measure
- Normalization
- Expectation values
- Information geometry
- Correlation functions
- General properties
- References
The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculate expectation values and Green's functions, forming a bridge to Fredholm theory. It also provides a natural setting for the information geometry approach to information theory, where the Fisher information metric can be understood to be a correlation function derived from the partition function; it happens to define a Riemannian manifold.
When the setting for random variables is on complex projective space or projective Hilbert space, geometrized with the Fubini–Study metric, the theory of quantum mechanics and more generally quantum field theory results. In these theories, the partition function is heavily exploited in the path integral formulation, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued simplex of probability theory, an extra factor of i appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.
Definition
Given a set of random variables
The function H is understood to be a real-valued function on the space of states
for the case of continuously-varying
When H is an observable, such as a finite-dimensional matrix or an infinite-dimensional Hilbert space operator or element of a C-star algebra, it is common to express the summation as a trace, so that
When H is infinite-dimensional, then, for the above notation to be valid, the argument must be trace class, that is, of a form such that the summation exists and is bounded.
The number of variables
Such is the case for the partition function in quantum field theory.
A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a generating function for correlation functions. This is discussed in greater detail below.
The parameter β
The role or meaning of the parameter
For the general case, one considers a set of functions
Some specific examples are in order. In basic thermodynamics problems, when using the canonical ensemble, the use of just one parameter
For the general case, one has
with
For a collection of observables
As before, it is presumed that the argument of tr is trace class.
The corresponding Gibbs measure then provides a probability distribution such that the expectation value of each
with the angle brackets
Although the value of
Symmetry
The potential function itself commonly takes the form of a sum:
where the sum over s is a sum over some subset of the power set P(X) of the set
The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under the action of a group symmetry, such as translational invariance. Such symmetries can be discrete or continuous; they materialize in the correlation functions for the random variables (discussed below). Thus a symmetry in the Hamiltonian becomes a symmetry of the correlation function (and vice versa).
This symmetry has a critically important interpretation in probability theory: it implies that the Gibbs measure has the Markov property; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on the equivalence classes of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such as Hopfield networks.
As a measure
The value of the expression
can be interpreted as a likelihood that a specific configuration of values
is the probability of the configuration
There exists at least one configuration
Conditions under which a ground state exists and is unique are given by the Karush–Kuhn–Tucker conditions; these conditions are commonly used to justify the use of the Gibbs measure in maximum-entropy problems.
Normalization
The values taken by
Expectation values
The partition function is commonly used as a generating function for expectation values of various functions of the random variables. So, for example, taking
gives the average (expectation value) of H. In physics, this would be called the average energy of the system.
Given the definition of the probability measure above, the expectation value of any function f of the random variables X may now be written as expected: so, for discrete-valued X, one writes
The above notation is strictly correct for a finite number of discrete random variables, but should be seen to be somewhat 'informal' for continuous variables; properly, the summations above should be replaced with the notations of the underlying sigma algebra used to define a probability space. That said, the identities continue to hold, when properly formulated on a measure space.
Thus, for example, the entropy is given by
The Gibbs measure is the unique statistical distribution that maximizes the entropy for a fixed expectation value of the energy; this underlies its use in maximum entropy methods.
Information geometry
The points
Multiple derivatives with regard to the Lagrange multipliers gives rise to a positive semi-definite covariance matrix
This matrix is positive semi-definite, and may be interpreted as a metric tensor, specifically, a Riemannian metric. Equipping the space of lagrange multipliers with a metric in this way turns it into a Riemannian manifold. The study of such manifolds is referred to as information geometry; the metric above is the Fisher information metric. Here,
That the above defines the Fisher information metric can be readily seen by explicitly substituting for the expectation value:
where we've written
Curiously, the Fisher information metric can also be understood as the flat-space Euclidean metric, after appropriate change of variables, as described in the main article on it. When the
Correlation functions
By introducing artificial auxiliary functions
one then has
as the expectation value of
Multiple differentiations lead to the connected correlation functions of the random variables. Thus the correlation function
For the case where H can be written as a quadratic form involving a differential operator, that is, as
then the correlation function
General properties
Partition functions are used to discuss critical scaling, universality and are subject to the renormalization group.