Supriya Ghosh (Editor)

Bernoulli distribution

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Parameters
  
0 < p < 1 , p ∈ R {\displaystyle 0
Support
  
k ∈ { 0 , 1 } {\displaystyle k\in \{0,1\}\,}

pmf
  
{ q = ( 1 − p ) for  k = 0 p for  k = 1 {\displaystyle {\begin{cases}q=(1-p)&{\text{for }}k=0\\p&{\text{for }}k=1\end{cases}}}

CDF
  
{ 0 for  k < 0 1 − p for  0 ≤ k < 1 1 for  k ≥ 1 {\displaystyle {\begin{cases}0&{\text{for }}k<0\\1-p&{\text{for }}0\leq k<1\\1&{\text{for }}k\geq 1\end{cases}}}

Mean
  
p {\displaystyle p\,}

Median
  
{ 0 if  q > p 0.5 if  q = p 1 if  q < p {\displaystyle {\begin{cases}0&{\text{if }}q>p\\0.5&{\text{if }}q=p\\1&{\text{if }}q

In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jacob Bernoulli, is the probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q = 1 p . It can be used to represent a coin toss where 1 and 0 would represent "head" and "tail" (or vice versa), respectively. In particular, unfair coins would have p 0.5 .

Contents

The Bernoulli distribution is a special case of the binomial distribution where a single experiment/trial is conducted (n=1). It is also a special case of the two-point distribution, for which the two possible outcomes need not be 0 and 1.

Properties of the Bernoulli Distribution

If X is a random variable with this distribution, we have:

Pr ( X = 0 ) = 1 Pr ( X = 1 ) = 1 p = q .

The probability mass function f of this distribution, over possible outcomes k, is

f ( k ; p ) = { p if  k = 1 , 1 p if  k = 0.

This can also be expressed as

f ( k ; p ) = p k ( 1 p ) 1 k for  k { 0 , 1 } .

The Bernoulli distribution is a special case of the binomial distribution with n = 1 .

The kurtosis goes to infinity for high and low values of p , but for p = 1 / 2 the two-point distributions including the Bernoulli distribution have a lower excess kurtosis than any other probability distribution, namely −2.

The Bernoulli distributions for 0 p 1 form an exponential family.

The maximum likelihood estimator of p based on a random sample is the sample mean.

Mean

The expected value of a Bernoulli random variable X is

E ( X ) = p

This is due to the fact that for a Bernoulli distributed random variable X with Pr ( X = 1 ) = p and Pr ( X = 0 ) = q we find

E [ X ] = Pr ( X = 1 ) 1 + Pr ( X = 0 ) 0 = p 1 + q 0 = p

Variance

The variance of a Bernoulli distributed X is

Var [ X ] = p q = p ( 1 p )

We first find

E [ X 2 ] = Pr ( X = 1 ) 1 2 + Pr ( X = 0 ) 0 2 = p 1 2 + q 0 2 = p

From this follows

Var [ X ] = E [ X 2 ] E [ X ] 2 = p p 2 = p ( 1 p ) = p q

Skewness

The skewness is q p p q = 1 2 p p q . When we take the standardized Bernoulli distributed random variable X E [ X ] Var [ X ] we find that this random variable attains q p q with probability p and attains p p q with probability q . Thus we get

γ 1 = E [ ( X E [ X ] Var [ X ] ) 3 ] = p ( q p q ) 3 + q ( p p q ) 3 = 1 p q 3 ( p q 3 q p 3 ) = p q p q 3 ( q p ) = q p p q
  • If X 1 , , X n are independent, identically distributed (i.i.d.) random variables, all Bernoulli distributed with success probability p, then
  • Y = k = 1 n X k B ( n , p ) (binomial distribution).

    The Bernoulli distribution is simply B ( 1 , p ) .

  • The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
  • The Beta distribution is the conjugate prior of the Bernoulli distribution.
  • The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success.
  • If Y ~ Bernoulli(0.5), then (2Y-1) has a Rademacher distribution.
  • References

    Bernoulli distribution Wikipedia