Yule–Simon distribution

From Infogalactic: the planetary knowledge core
(Redirected from Yule-Simon distribution)
Jump to: navigation, search

Lua error in Module:Infobox at line 314: malformed pattern (missing ']'). In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.[1]

The probability mass function (pmf) of the Yule–Simon (ρ) distribution is

f(k;\rho) = \rho\operatorname{B}(k, \rho+1),

for integer k \geq 1 and real \rho > 0, where \operatorname{B} is the beta function. Equivalently the pmf can be written in terms of the falling factorial as


 f(k;\rho) = \frac{\rho\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}}
,

where \Gamma is the gamma function. Thus, if \rho is an integer,


 f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!}
.

The parameter \rho can be estimated using a fixed point algorithm.[2]

The probability mass function f has the property that for sufficiently large k we have


 f(k;\rho)
 \approx \frac{\rho\Gamma(\rho+1)}{k^{\rho+1}}
 \propto \frac{1}{k^{\rho+1}}
.
File:Yule-Simon distribution.png
Plot of the Yule–Simon(1) distribution (red) and its asymptotic Zipf's law (blue)

This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: f(k;\rho) can be used to model, for example, the relative frequency of the kth most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of k.

Occurrence

The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa.[3] Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process.[citation needed] The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.

The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution.[citation needed] Specifically, assume that W follows an exponential distribution with scale 1/\rho or rate \rho:

W \sim \operatorname{Exponential}(\rho),

with density

h(w;\rho) = \rho \exp(-\rho w).

Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W:

K \sim \operatorname{Geometric}(\exp(-W))\, .

The pmf of a geometric distribution is

g(k; p) = p (1-p)^{k-1}

for k\in\{1,2,\dotsc\}. The Yule–Simon pmf is then the following exponential-geometric compound distribution:

f(k;\rho)
 = \int_0^{\infty} g(k;\exp(-w)) h(w;\rho)\,dw
.

The following recurrence relation holds:

\left\{\begin{array}{l}
k P(k)=(\alpha +k+1) P(k+1), \\[10pt]
P(1)=\alpha  B(\alpha +1,1)
\end{array}\right\}

Generalizations

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as


 f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \;
        \mathrm{B}_{1-\alpha}(k, \rho+1)
 ,
 \,

with 0 \leq \alpha < 1. For \alpha = 0 the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.

Bibliography

  • Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.