Puneet Varma (Editor)

Bregman divergence

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In mathematics, a Bregman divergence or Bregman distance is similar to a metric, but does not satisfy the triangle inequality nor symmetry.

Contents

Bregman divergences are named after Lev M. Bregman, who introduced the concept in 1967.

Definition

Let F : Ω R be a continuously-differentiable real-valued and strictly convex function defined on a closed convex set Ω .

The Bregman distance associated with F for points p , q Ω is the difference between the value of F at point p and the value of the first-order Taylor expansion of F around point q evaluated at point p:

D F ( p , q ) = F ( p ) F ( q ) F ( q ) , p q .

Properties

  • Non-negativity: D F ( p , q ) 0 for all p, q. This is a consequence of the convexity of F.
  • Convexity: D F ( p , q ) is convex in its first argument, but not necessarily in the second argument (see )
  • Linearity: If we think of the Bregman distance as an operator on the function F, then it is linear with respect to non-negative coefficients. In other words, for F 1 , F 2 strictly convex and differentiable, and λ 0 ,
  • Duality: The function F has a convex conjugate F . The Bregman distance defined with respect to F has an interesting relationship to D F ( p , q )
  • Here, p = F ( p ) and q = F ( q ) are the dual points corresponding to p and q.
  • Mean as minimizer: A key result about Bregman divergences is that, given a random vector, the mean vector minimizes the expected Bregman divergence from the random vector. This result generalizes the textbook result that the mean of a set minimizes total squared error to elements in the set. This result was proved for the vector case by (Banerjee et al. 2005), and extended to the case of functions/distributions by (Frigyik et al. 2008). This result is important because it further justifies using a mean as a representative of a random set, particularly in Bayesian estimation.
  • Examples

  • Squared Euclidean distance D F ( x , y ) = x y 2 is the canonical example of a Bregman distance, generated by the convex function F ( x ) = x 2
  • The squared Mahalanobis distance, D F ( x , y ) = 1 2 ( x y ) T Q ( x y ) which is generated by the convex function F ( x ) = 1 2 x T Q x . This can be thought of as a generalization of the above squared Euclidean distance.
  • The generalized Kullback–Leibler divergence
  • is generated by the convex function
  • The Itakura–Saito distance,
  • is generated by the convex function

    Generalizing projective duality

    A key tool in computational geometry is the idea of projective duality, which maps points to hyperplanes and vice versa, while preserving incidence and above-below relationships. There are numerous analytical forms of the projective dual: one common form maps the point p = ( p 1 , p d ) to the hyperplane x d + 1 = 1 d 2 p i x i . This mapping can be interpreted (identifying the hyperplane with its normal) as the convex conjugate mapping that takes the point p to its dual point p = F ( p ) , where F defines the d-dimensional paraboloid x d + 1 = x i 2 .

    If we now replace the paraboloid by an arbitrary convex function, we obtain a different dual mapping that retains the incidence and above-below properties of the standard projective dual. This implies that natural dual concepts in computational geometry like Voronoi diagrams and Delaunay triangulations retain their meaning in distance spaces defined by an arbitrary Bregman divergence. Thus, algorithms from "normal" geometry extend directly to these spaces (Boissonnat, Nielsen and Nock, 2010)

    Bregman divergence on other objects

    Bregman divergences can also be defined between matrices, between functions, and between measures (distributions). Bregman divergences between matrices include the Stein's loss and von Neumann entropy. Bregman divergences between functions include total squared error, relative entropy, and squared bias; see the references by Frigyik et al. below for definitions and properties. Similarly Bregman divergences have also been defined over sets, through a submodular set function which is known as the discrete analog of a convex function. The submodular Bregman divergences subsume a number of discrete distance measures, like the Hamming distance, precision and recall, mutual information and some other set based distance measures (see Iyer & Bilmes, 2012) for more details and properties of the submodular Bregman.)

    For a list of common matrix Bregman divergences, see Table 15.1 in.

    References

    Bregman divergence Wikipedia