For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose
Contents
- Self adjoint matrices case
- Rectangular case
- Matrix Chernoff inequalities
- Matrix Chernoff I
- Matrix Chernoff II
- Matrix Bennett and Bernstein inequalities
- Bounded case
- Subexponential case
- Matrix Azuma
- Matrix Hoeffding
- Matrix bounded difference McDiarmid
- Survey of related theorems
- Ahlswede and Winter
- Tropp
- Subadditivity of log mgf
- Master tail bound
- References
The following theorems answer this general question under various assumptions; these assumptions are named below by analogy to their classical, scalar counterparts. All of these theorems can be found in (Tropp 2010), as the specific application of a general result which is derived below. A summary of related works is given.
Self-adjoint matrices case
Consider a finite sequence
Then, for all
where
Rectangular case
Consider a finite sequence
Then, for all
Matrix Chernoff inequalities
The classical Chernoff bounds concerns the sum of independent, nonnegative, and uniformly bounded random variables. In the matrix setting, the analogous theorem concerns a sum of positive-semidefinite random matrices subjected to a uniform eigenvalue bound.
Matrix Chernoff I
Consider a finite sequence
almost surely.
Define
Then
Matrix Chernoff II
Consider a sequence
almost surely.
Compute the minimum and maximum eigenvalues of the average expectation,
Then
The binary information divergence is defined as
for
Matrix Bennett and Bernstein inequalities
In the scalar setting, Bennett and Bernstein inequalities describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or subexponential. In the matrix case, the analogous results concern a sum of zero-mean random matrices.
Bounded case
Consider a finite sequence
almost surely.
Compute the norm of the total variance,
Then, the following chain of inequalities holds for all
The function
Subexponential case
Consider a finite sequence
for
Compute the variance parameter,
Then, the following chain of inequalities holds for all
Rectangular case
Consider a finite sequence
almost surely. Define the variance parameter
Then, for all
Matrix Azuma
The scalar version of Azuma's inequality states that a scalar martingale exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence. The following is the extension in matrix setting.
Consider a finite adapted sequence
almost surely.
Compute the variance parameter
Then, for all
The constant 1/8 can be improved to 1/2 when there is additional information available. One case occurs when each summand
Matrix Hoeffding
Placing addition assumption that the summands in Matrix Azuma are independent gives a matrix extension of Hoeffding's inequalities.
Consider a finite sequence
almost surely.
Then, for all
where
An improvement of this result was established in (Mackey et al. 2012): for all
where
Matrix bounded difference (McDiarmid)
In scalar setting, McDiarmid's inequality provides one common way of bounding the differences by applying Azuma's inequality to a Doob martingale. A version of the bounded differences inequality holds in the matrix setting.
Let
where
Then, for all
where
Survey of related theorems
The first bounds of this type were derived by (Ahlswede & Winter 2003). Recall the theorem above for self-adjoint matrix Gaussian and Rademacher bounds: For a finite sequence
where
Ahlswede and Winter would give the same result, except with
By comparison, the
The chief contribution of (Ahlswede & Winter 2003) was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see Chernoff bound#Theorem for additive form (absolute error)) to the case of self-adjoint matrices. The procedure given in the derivation below. All of the recent works on this topic follow this same procedure, and the chief differences follow from subsequent steps. Ahlswede & Winter use the Golden–Thompson inequality to proceed, whereas Tropp (Tropp 2010) uses Lieb's Theorem.
Suppose one wished to vary the length of the series (n) and the dimensions of the matrices (d) while keeping the right-hand side approximately constant. Then n must vary approximately as the log of d. Several papers have attempted to establish a bound without a dependence on dimensions. Rudelson and Vershynin (Rudelson & Vershynin 2007) give a result for matrices which are the outer product of two vectors. (Magen & Zouzias 2010) provide a result without the dimensional dependence for low rank matrices. The original result was derived independently from the Ahlswede–Winter approach, but (Oliveira 2010b) proves a similar result using the Ahlswede–Winter approach.
Finally, Oliveira (Oliviera 2010a) proves a result for matrix martingales independently from the Ahlswede–Winter framework. Tropp (Tropp 2011) slightly improves on the result using the Ahlswede–Winter framework. Neither result is presented in this article.
Ahlswede and Winter
The Laplace transform argument found in (Ahlswede & Winter 2003) is a significant result in its own right: Let
To prove this, fix
The second-to-last inequality is Markov's inequality. The last inequality holds since
Thus, our task is to understand
The Golden–Thompson inequality implies that
Suppose
Iterating this, we get
So far we have found a bound with an infimum over
Tropp
The major contribution of (Tropp 2010) is the application of Lieb's theorem where (Ahlswede & Winter 2003) had applied the Golden–Thompson inequality. Tropp's corollary is the following: If
Proof: Let
is concave. The final step is to use Jensen's inequality to move the expectation inside the function:
This gives us the major result of the paper: the subadditivity of the log of the matrix generating function.
Subadditivity of log mgf
Let
Proof: It is sufficient to let
To complete the proof, we use the law of total expectation. Let
Define
Finally, we have
where at every step m we use Tropp's corollary with
Master tail bound
The following is immediate from the previous result:
All of the theorems given above are derived from this bound; the theorems consist in various ways to bound the infimum. These steps are significantly simpler than the proofs given.