Kalpana Kalpana (Editor)

Yule–Simon distribution

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Yule–Simon distribution

Parameters
  
ρ > 0 {displaystyle ho >0,} shape (real)

Support
  
k ∈ { 1 , 2 , … } {displaystyle kin {1,2,dotsc }}

pmf
  
ρ B ⁡ ( k , ρ + 1 ) {displaystyle ho operatorname {B} (k, ho +1)}

CDF
  
1 − k B ⁡ ( k , ρ + 1 ) {displaystyle 1-koperatorname {B} (k, ho +1)}

Mean
  
ρ ρ − 1 {displaystyle { rac { ho }{ ho -1}}} for ρ > 1 {displaystyle ho >1}

Mode
  
1 {displaystyle 1}

In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.

Contents

The probability mass function (pmf) of the Yule–Simon (ρ) distribution is

f ( k ; ρ ) = ρ B ( k , ρ + 1 ) ,

for integer k 1 and real ρ > 0 , where B is the beta function. Equivalently the pmf can be written in terms of the falling factorial as

f ( k ; ρ ) = ρ Γ ( ρ + 1 ) ( k + ρ ) ρ + 1 _ ,

where Γ is the gamma function. Thus, if ρ is an integer,

f ( k ; ρ ) = ρ ρ ! ( k 1 ) ! ( k + ρ ) ! .

The parameter ρ can be estimated using a fixed point algorithm.

The probability mass function f has the property that for sufficiently large k we have

f ( k ; ρ ) ρ Γ ( ρ + 1 ) k ρ + 1 1 k ρ + 1 .

This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: f ( k ; ρ ) can be used to model, for example, the relative frequency of the k th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of k .

Occurrence

The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa. Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process. The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.

The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution. Specifically, assume that W follows an exponential distribution with scale 1 / ρ or rate ρ :

W Exponential ( ρ ) ,

with density

h ( w ; ρ ) = ρ exp ( ρ w ) .

Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W:

K Geometric ( exp ( W ) ) .

The pmf of a geometric distribution is

g ( k ; p ) = p ( 1 p ) k 1

for k { 1 , 2 , } . The Yule–Simon pmf is then the following exponential-geometric compound distribution:

f ( k ; ρ ) = 0 g ( k ; exp ( w ) ) h ( w ; ρ ) d w .

The following recurrence relation holds:

{ k P ( k ) = ( α + k + 1 ) P ( k + 1 ) , P ( 1 ) = α B ( α + 1 , 1 ) }

Generalizations

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as

f ( k ; ρ , α ) = ρ 1 α ρ B 1 α ( k , ρ + 1 ) ,

with 0 α < 1 . For α = 0 the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.

References

Yule–Simon distribution Wikipedia