In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable. It is named after the Finnish–American mathematical statistician Wassily Hoeffding.
The proof of Hoeffding's lemma uses Taylor's theorem and Jensen's inequality. Hoeffding's lemma is itself used in the proof of McDiarmid's inequality.
Let X be any real-valued random variable with expected value E ( X ) = 0 and such that a ≤ X ≤ b almost surely. Then, for all λ ∈ R ,
E [ e λ X ] ≤ exp ( λ 2 ( b − a ) 2 8 ) .
Note that by the assumption that the random variable X has zero expectation, the a and b in the lemma must satisfy a ≤ 0 and 0 ≤ b .
Since e λ x is a convex function of x , we have
e λ x ≤ b − x b − a e λ a + x − a b − a e λ b ∀ a ≤ x ≤ b So, E [ e λ X ] ≤ b − E ( X ) b − a e λ a + E ( X ) − a b − a e λ b .
Let h = λ ( b − a ) , p = − a b − a and L ( h ) = − h p + ln ( 1 − p + p e h )
Then, b − E ( X ) b − a e λ a + E ( X ) − a b − a e λ b = e L ( h ) since E ( X ) = 0
Taking derivative of L ( h ) ,
L ( 0 ) = L ′ ( 0 ) = 0 and L ″ ( h ) ≤ 1 4 for all h.
By Taylor's expansion,
L ( h ) ≤ 1 8 h 2 = 1 8 λ 2 ( b − a ) 2
Hence, E [ e λ X ] ≤ e 1 8 λ 2 ( b − a ) 2
(The proof below is the same proof with more explanation.)
First note that if one of a or b is zero, then P ( X = 0 ) = 1 and the inequality follows. If both are nonzero, then a must be negative and b must be positive.
Next, recall that e s x is a convex function on the real line:
∀ x ∈ [ a , b ] : e s x ≤ b − x b − a e s a + x − a b − a e s b . Applying E to both sides of the above inequality gives us:
E [ e s X ] ≤ b − E ( X ) b − a e s a + E ( X ) − a b − a e s b = b b − a e s a + − a b − a e s b E ( X ) = 0 = ( − a b − a ) e s a ( − b a + e s b − s a ) = ( − a b − a ) e s a ( − b − a + a a + e s ( b − a ) ) = ( − a b − a ) e s a ( − b − a a − 1 + e s ( b − a ) ) = ( 1 − θ + θ e s ( b − a ) ) e − s θ ( b − a ) θ = − a b − a > 0 Let u = s ( b − a ) and define:
{ φ : R → R φ ( u ) = − θ u + log ( 1 − θ + θ e u ) φ is well defined on R , to see this we calculate:
1 − θ + θ e u = θ ( 1 θ − 1 + e u ) = θ ( − b a + e u ) > 0 θ > 0 , b a < 0 The definition of φ implies
E [ e s X ] ≤ e φ ( u ) . By Taylor's theorem, for every real u there exists a v between 0 and u such that
φ ( u ) = φ ( 0 ) + u φ ′ ( 0 ) + 1 2 u 2 φ ″ ( v ) . Note that:
φ ( 0 ) = 0 φ ′ ( 0 ) = − θ + θ e u 1 − θ + θ e u | u = 0 = 0 φ ″ ( v ) = θ e v ( 1 − θ + θ e v ) − θ 2 e 2 v ( 1 − θ + θ e v ) 2 = θ e v 1 − θ + θ e v ( 1 − θ e v 1 − θ + θ e v ) = t ( 1 − t ) t = θ e v 1 − θ + θ e v ≤ 1 4 t > 0 Therefore,
φ ( u ) ≤ 0 + u ⋅ 0 + 1 2 u 2 ⋅ 1 4 = 1 8 u 2 = 1 8 s 2 ( b − a ) 2 . This implies
E [ e s X ] ≤ exp ( 1 8 s 2 ( b − a ) 2 ) .