profile pic
⌘ '
raccourcis clavier

one of the earliest data streaming algorithm.

problem.

Given the bag bb of nn elements and an integer k2k \geq 2. Find the values that occur more than n/kn/k times in bb

idea: two passes over the values in bb, while storing at most kk values from bb and their number of occurrences.

Assume the bag is available in array b[0:n1]b[0:n-1] of nn elements, then a heavy-hitter of bag bb is a value that occurs more than n/kn/k times in bb for some integer k2k \geq 2

pseudocode.

"\\begin{algorithm}\n\\caption{Misra--Gries}\n\\begin{algorithmic}\n\\State $t \\gets \\{\\}$\n\\State $d \\gets 0$\n\\For{$i \\gets 0$ to $n-1$}\n \\If{$b[i] \\notin t$}\n \\State $t \\gets t \\cup \\{b[i]\\}$\n \\State $d \\gets d + 1$\n \\Else\n \\State $t \\gets t \\cup \\{b[i]\\}$\n \\EndIf\n \\If{$d = k$}\n \\State Delete $k$ distinct values from $t$\n \\State Update $d$\n \\EndIf\n\\EndFor\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 3 Misra--Gries

1:t{}t \gets \{\}

2:d0d \gets 0

3:for i0i \gets 0 to n1n-1 do

4:if b[i]tb[i] \notin t then

5:tt{b[i]}t \gets t \cup \{b[i]\}

6:dd+1d \gets d + 1

7:else

8:tt{b[i]}t \gets t \cup \{b[i]\}

9:end if

10:if d=kd = k then

11:Delete kk distinct values from tt

12:Update dd

13:end if

14:end for