---
date: '2024-12-01'
description: empirical power law where word frequency inversely relates to rank, most common word appearing twice as often as second.
id: Zipf-Law
modified: 2026-06-05 15:08:26 GMT-04:00
tags:
  - seed
title: Zipf's Law
created: '2024-12-01'
published: '2024-12-01'
pageLayout: default
slug: thoughts/Zipf-Law
permalink: https://aarnphm.xyz/thoughts/Zipf-Law.md
generator:
  quartz: v4.6.0
  hostedProvider: Cloudflare
  baseUrl: aarnphm.xyz
full: https://aarnphm.xyz/llms-full.txt
---
Applies to frequency table of word in corpus of [[thoughts/Language|language]]:

$$
\text{word frequency} \propto \frac{1}{\text{word rank}}
$$

Empirically:

- the most common word occurs approximately twice as often as the next common one, three times as often as the third most common, and so on.

also known in _Zipf-Mandelbrot’s_ law:

$$
\begin{aligned}
\text{frequency} &\propto \frac{1}{(\text{rank} + b)^a} \\[8pt]
&\because a,b: \text{fitted parameters with } a \approx 1 \text{ and } b \approx 2.7
\end{aligned}
$$

## definition

> \[!definition\] Definition 1. Zipf distribution
>
> the distribution on $N$ elements assign to element of rank $k$ (counting from 1) the probability:
>
> $$
> \begin{aligned}
> f(k;N) &= \begin{cases}
> \frac{1}{H_N} \frac{1}{k}, & \text{if } 1 \leq k \leq N, \\
> 0, & \text{if } k < 1 \text{ or } N < k.
> \end{cases} \\[12pt]
> &\because H_N \equiv \sum_{k=1}^{N} \frac{1}{k}. (\text{normalisation constant})
> \end{aligned}
> $$

