Measuring sample quality with stein s method
Stein's method is a general method in probability theory to obtain bounds on the distance between two probability distributions with respect to a probability metric. It was introduced by Charles Stein, who first published it in 1972, to obtain a bound between the distribution of a sum of
Contents
- Measuring sample quality with stein s method
- History
- Probability metrics
- The Stein operator
- The Stein equation
- Solving the Stein equation
- Properties of the solution to the Stein equation
- An abstract approximation theorem
- Application of the theorem
- Connections to other methods
- Literature
- References
History
At the end of the 1960s, unsatisfied with the by-then known proofs of a specific central limit theorem, Charles Stein developed a new way of proving the theorem for his statistics lecture. His seminal paper was presented in 1970 at the sixth Berkeley Symposium and published in the corresponding proceedings.
Later, his Ph.D. student Louis Chen Hsiao Yun modified the method so as to obtain approximation results for the Poisson distribution, therefore the Stein method applied to the problem of Poisson approximation is often referred to as the Stein-Chen method.
Probably the most important contributions are the monograph by Stein (1986), where he presents his view of the method and the concept of auxiliary randomisation, in particular using exchangeable pairs, and the articles by Barbour (1988) and Götze (1991), who introduced the so-called generator interpretation, which made it possible to easily adapt the method to many other probability distributions. An important contribution was also an article by Bolthausen (1984) on the so-called combinatorial central limit theorem.
In the 1990s the method was adapted to a variety of distributions, such as Gaussian processes by Barbour (1990), the binomial distribution by Ehm (1991), Poisson processes by Barbour and Brown (1992), the Gamma distribution by Luk (1994), and many others.
Probability metrics
Stein's method is a way to bound the distance between two probability distributions using a specific probability metric.
Let the metric be given in the form
Here,
Important examples are the total variation metric, where we let
In what follows
The Stein operator
We assume now that the distribution
First of all, we need an operator
We call such an operator the Stein operator.
For the standard normal distribution, Stein's lemma yields such an operator:
Thus, we can take
There are in general infinitely many such operators and it still remains an open question, which one to choose. However, it seems that for many distributions there is a particular good one, like (2.3) for the normal distribution.
There are different ways to find Stein operators.
The Stein equation
It is usually possible to define function
We call (3.1) the Stein equation. Replacing
Now all the effort is worth only if the left-hand side of (3.2) is easier to bound than the right hand side. This is, surprisingly, often the case.
If
If probability distribution Q has an absolutely continuous (with respect to the Lebesgue measure) density q, then
Solving the Stein equation
Analytic methods. Equation (3.3) can be easily solved explicitly:
Generator method. If
where
Properties of the solution to the Stein equation
Usually, one tries to give bounds on
for some specific
In the case of (4.1) one can prove for the supremum norm that
where the last bound is of course only applicable if
If we have bounds in the general form (5.1), we usually are able to treat many probability metrics together. One can often start with the next step below, if bounds of the form (5.1) are already available (which is the case for many distributions).
An abstract approximation theorem
We are now in a position to bound the left hand side of (3.1). As this step heavily depends on the form of the Stein operator, we directly regard the case of the standard normal distribution.
At this point we could directly plug in random variable
Assume that
Using Taylor expansion, it is possible to prove that
Note that, if we follow this line of argument, we can bound (1.1) only for functions where
Theorem A. If
Proof. Recall that the Lipschitz metric is of the form (1.1) where the functions
Thus, roughly speaking, we have proved that, to calculate the Lipschitz-distance between a
Application of the theorem
We can treat the case of sums of independent and identically distributed random variables with Theorem A.
Assume that
For sums of random variables another approach related to Steins Method is known as the zero bias transform.
Connections to other methods
Literature
The following text is advanced, and gives a comprehensive overview of the normal case
Another advanced book, but having some introductory character, is
A standard reference is the book by Stein,
which contains a lot of interesting material, but may be a little hard to understand at first reading.
Despite its age, there are few standard introductory books about Stein's method available. The following recent textbook has a chapter (Chapter 2) devoted to introducing Stein's method:
Although the book
is by large parts about Poisson approximation, it contains nevertheless a lot of information about the generator approach, in particular in the context of Poisson process approximation.
The following textbook has a chapter (Chapter 10) devoted to introducing Stein's method of Poisson approximation: