Zipf's Law
Applies to frequency table of word in corpus of language:
word frequency∝word rank1
Empirically:
- the most common word occurs approximately twice as often as the next common one, three times as often as the third most common, and so on.
also known in Zipf-Mandelbrot’s law:
frequency∝(rank+b)a1∵a,b:fitted parameters with a≈1 and b≈2.7
definition
the distribution on N elements assign to element of rank k (counting from 1) the probability:
f(k;N)={HN1k1,0,if 1≤k≤N,if k<1 or N<k.∵HN≡k=1∑Nk1.(normalisation constant)