Proofs of convergence of random variables - Alchetron, the free social encyclopedia

This article is supplemental for “Convergence of random variables” and provides proofs for selected results.

Convergence almost surely implies convergence in probability

X n → a s X ⇒ X n → p X

Proof: If {X_n} converges to X almost surely, it means that the set of points {ω: lim X_n(ω) ≠ X(ω)} has measure zero; denote this set O. Now fix ε > 0 and consider a sequence of sets

A n = ⋃ m ≥ n { | X m − X | > ε }

This sequence of sets is decreasing: A_n ⊇ A_n+1 ⊇ ..., and it decreases towards the set

A ∞ = ⋂ n ≥ 1 A n .

For this decreasing sequence of events, their probabilities are also a decreasing sequence, and it decreases towards the Pr(A_∞); we shall show now that this number is equal to zero. Now any point ω in the complement of O is such that lim X_n(ω) = X(ω), which implies that |X_n(ω) − X(ω)| < ε for all n greater than a certain number N. Therefore, for all n ≥ N the point ω will not belong to the set A_n, and consequently it will not belong to A_∞. This means that A_∞ is disjoint with O, or equivalently, A_∞ is a subset of O and therefore Pr(A_∞) = 0.

Finally, consider

Pr ⁡ ( | X n − X | > ε ) ≤ Pr ⁡ ( A n ) → n → ∞ 0 ,

which by definition means that X_n converges in probability to X.

Convergence in probability does not imply almost sure convergence in the discrete case

If X_n are independent random variables assuming value one with probability 1/n and zero otherwise, then X_n converges to zero in probability but not almost surely. This can be verified using the Borel–Cantelli lemmas.

Convergence in probability implies convergence in distribution

X n → p X ⇒ X n → d X ,

Proof for the case of scalar random variables

Lemma. Let X, Y be random variables, let a be a real number and ε > 0. Then

Pr ⁡ ( Y ≤ a ) ≤ Pr ⁡ ( X ≤ a + ε ) + Pr ⁡ ( | Y − X | > ε ) .

Proof of lemma:

Pr ⁡ ( Y ≤ a ) = Pr ⁡ ( Y ≤ a , X ≤ a + ε ) + Pr ⁡ ( Y ≤ a , X > a + ε ) ≤ Pr ⁡ ( X ≤ a + ε ) + Pr ⁡ ( Y − X ≤ a − X , a − X < − ε ) ≤ Pr ⁡ ( X ≤ a + ε ) + Pr ⁡ ( Y − X < − ε ) ≤ Pr ⁡ ( X ≤ a + ε ) + Pr ⁡ ( Y − X < − ε ) + Pr ⁡ ( Y − X > ε ) = Pr ⁡ ( X ≤ a + ε ) + Pr ⁡ ( | Y − X | > ε )

Proof of the theorem: Recall that in order to prove convergence in distribution, one must show that the sequence of cumulative distribution functions converges to the F_X at every point where F_X is continuous. Let a be such a point. For every ε > 0, due to the preceding lemma, we have:

Pr ⁡ ( X n ≤ a ) ≤ Pr ⁡ ( X ≤ a + ε ) + Pr ⁡ ( | X n − X | > ε ) Pr ⁡ ( X ≤ a − ε ) ≤ Pr ⁡ ( X n ≤ a ) + Pr ⁡ ( | X n − X | > ε )

So, we have

Pr ⁡ ( X ≤ a − ε ) − Pr ⁡ ( | X n − X | > ε ) ≤ Pr ⁡ ( X n ≤ a ) ≤ Pr ⁡ ( X ≤ a + ε ) + Pr ⁡ ( | X n − X | > ε ) .

Taking the limit as n → ∞, we obtain:

F X ( a − ε ) ≤ Pr ⁡ ( X n ≤ a ) ≤ F X ( a + ε ) ,

where F_X(a) = Pr(X ≤ a) is the cumulative distribution function of X. This function is continuous at a by assumption, and therefore both F_X(a−ε) and F_X(a+ε) converge to F_X(a) as ε → 0⁺. Taking this limit, we obtain

lim n → ∞ Pr ⁡ ( X n ≤ a ) = Pr ⁡ ( X ≤ a ) ,

which means that {X_n} converges to X in distribution.

Proof for the generic case

The implication follows for when X_n is a random vector by using this property proved later on this page and by taking Y_n = X.

Convergence in distribution to a constant implies convergence in probability

X n → d c ⇒ X n → p c , provided c is a constant.

Proof: Fix ε > 0. Let B_ε(c) be the open ball of radius ε around point c, and B_ε^c(c)its complement. Then

Pr ⁡ ( | X n − c | ≥ ε ) = Pr ⁡ ( X n ∈ B ε c ( c ) ) .

By the portmanteau lemma (part C), if X_n converges in distribution to c, then the limsup of the latter probability must be less than or equal to Pr(c ∈ B_ε^c(c)),which is obviously equal to zero. Therefore,

lim n → ∞ Pr ⁡ ( | X n − c | ≥ ε ) ≤ lim sup n → ∞ Pr ⁡ ( | X n − c | ≥ ε ) = lim sup n → ∞ Pr ⁡ ( X n ∈ B ε c ( c ) ) ≤ Pr ⁡ ( c ∈ B ε c ( c ) ) = 0

which by definition means that X_n converges to c in probability.

Convergence in probability to a sequence converging in distribution implies convergence to the same distribution

| Y n − X n | → p 0 , X n → d X ⇒ Y n → d X

Proof: We will prove this theorem using the portmanteau lemma, part B. As required in that lemma, consider any bounded function f (i.e. |f(x)| ≤ M) which is also Lipschitz:

∃ K > 0 , ∀ x , y : | f ( x ) − f ( y ) | ≤ K | x − y | .

Take some ε > 0 and majorize the expression |E[f(Y_n)] − E[f(X_n)]| as

| E ⁡ [ f ( Y n ) ] − E ⁡ [ f ( X n ) ] | ≤ E ⁡ [ | f ( Y n ) − f ( X n ) | ] = E ⁡ [ | f ( Y n ) − f ( X n ) | 1 { | Y n − X n | < ε } ] + E ⁡ [ | f ( Y n ) − f ( X n ) | 1 { | Y n − X n | ≥ ε } ] ≤ E ⁡ [ K | Y n − X n | 1 { | Y n − X n | < ε } ] + E ⁡ [ 2 M 1 { | Y n − X n | ≥ ε } ] ≤ K ε Pr ⁡ ( | Y n − X n | < ε ) + 2 M Pr ⁡ ( | Y n − X n | ≥ ε ) ≤ K ε + 2 M Pr ⁡ ( | Y n − X n | ≥ ε )

(here 1_{...} denotes the indicator function; the expectation of the indicator function is equal to the probability of corresponding event). Therefore,

If we take the limit in this expression as n → ∞, the second term will go to zero since {Y_n−X_n} converges to zero in probability; and the third term will also converge to zero, by the portmanteau lemma and the fact that X_n converges to X in distribution. Thus

lim n → ∞ | E ⁡ [ f ( Y n ) ] − E ⁡ [ f ( X ) ] | ≤ K ε .

Since ε was arbitrary, we conclude that the limit must in fact be equal to zero, and therefore E[f(Y_n)] → E[f(X)], which again by the portmanteau lemma implies that {Y_n} converges to X in distribution. QED.

Convergence of one sequence in distribution and another to a constant implies joint convergence in distribution

X n → d X , Y n → d c ⇒ ( X n , Y n ) → d ( X , c ) provided c is a constant.

Proof: We will prove this statement using the portmanteau lemma, part A.

First we want to show that (X_n, c) converges in distribution to (X, c). By the portmanteau lemma this will be true if we can show that E[f(X_n, c)] → E[f(X, c)] for any bounded continuous function f(x, y). So let f be such arbitrary bounded continuous function. Now consider the function of a single variable g(x) := f(x, c). This will obviously be also bounded and continuous, and therefore by the portmanteau lemma for sequence {X_n} converging in distribution to X, we will have that E[g(X_n)] → E[g(X)]. However the latter expression is equivalent to “E[f(X_n, c)] → E[f(X, c)]”, and therefore we now know that (X_n, c) converges in distribution to (X, c).

Secondly, consider |(X_n, Y_n) − (X_n, c)| = |Y_n − c|. This expression converges in probability to zero because Y_n converges in probability to c. Thus we have demonstrated two facts:

{ | ( X n , Y n ) − ( X n , c ) | → p 0 , ( X n , c ) → d ( X , c ) .

By the property proved earlier, these two facts imply that (X_n, Y_n) converge in distribution to (X, c).

Convergence of two sequences in probability implies joint convergence in probability

X n → p X , Y n → p Y ⇒ ( X n , Y n ) → p ( X , Y )

Proof:

Pr ⁡ ( | ( X n , Y n ) − ( X , Y ) | ≥ ε ) ≤ Pr ⁡ ( | X n − X | + | Y n − Y | ≥ ε ) ≤ Pr ⁡ ( | X n − X | ≥ ε / 2 ) + Pr ⁡ ( | Y n − Y | ≥ ε / 2 )

Each of the probabilities on the right-hand side converge to zero as n → ∞ by definition of the convergence of {X_n} and {Y_n} in probability to X and Y respectively. Taking the limit we conclude that the left-hand side also converges to zero, and therefore the sequence {(X_n, Y_n)} converges in probability to {(X, Y)}.

References

Proofs of convergence of random variables Wikipedia

(Text) CC BY-SA

Contents