# Entropy/connections between different meanings of entropy/entropy example 4

To understand the formula

- <math approx>

R(B) \approx \frac{\frac 1nH_{\mu_B}(\Lambda^n)}{\log_2(\#\Lambda)}, </math>

consider a lossless compression algorithm applied to a message \(B\) of a finite length \(N\). To save on the output length, the decoding instructions must be relatively short compared to \(N\). This is easily achieved in codes which refer to relatively short components of the messages only. For example, the decoding instructions may consist of a complete list of the subwords \(A\) of some carefully chosen length \(n\) appearing in \(B\) and their images \(\Phi(A)\). The images may have different lengths (as short as possible). The image of \(B\) is obtained by cutting \(B\) into subwords \(B=A_1A_2\dots A_k\) of length \(n\) and concatenating the images of these subwords\[\Phi(B)=\Phi(A_1)\Phi(A_2)\cdots\Phi(A_k)\]. (There are additional issues here: in order for such a code to be invertible, the images \(\Phi(A)\) must form a prefix-free family, so that there is always a unique way of partitioning \(\Phi(B)\) back into the images \(\Phi(A_i)\). But this does not essentially affect the computations.) For best compression results, it is reasonable to assign the shortest images to the subwords appearing in \(B\) with highest frequencies. It can be proved, with the help of the Shannon-McMillan-Breiman theorem, that such a code realizes the compression rate as in (<ref>approx</ref>) for nearly all sufficiently long messages \(B\).

For example, consider the binary message of length 30:

\[ B \ = \ 011001111001001111010001111110 \ = \ 011,001,111,001,001,111,010,001,111,110. \]

On the right, \(B\) is shown divided into subwords of length 3. The frequencies of the subwords in this division are:

\[ \begin{matrix} 000 - 0 & 001 - 0.4 & 010 - 0.1 & 011 - 0.1 \\ 100 - 0 & 101 - 0\ \ \ & 110 - 0.1 & 111 - 0.3 \end{matrix} \]

The theoretical value (obtained using the formula (<ref>approx</ref>) for \(n=3\)) of the best compression ratio is

\[ -0.4\log_2(0,4)-0.3\log_2(0,3)-3\cdot 0.1\log_2(0.1)\approx 68,2\%. \]

A binary prefix-free code giving the shortest images to the most frequent subwords is

\[ \begin{matrix} 001 &\mapsto& 0, \\ 111 &\mapsto& 10, \\ 010 &\mapsto& 110, \\ 011 &\mapsto& 1110, \\ 110 &\mapsto& 1111. \end{matrix} \]

The image of \(B\) by this code is:

\[ \Phi(B)= 1110,0,10,0,0,10,110,0,10,1111 = 111001000101100101111 \]

The compression rate achieved on \(B\) using this code equals \(21/30 = 70\%\), which means that the code is quite efficient on \(B\).

*Remark*: The image given here does not include the decoding instructions, which comprise simply a record of the above prefix-free code. One has to imagine that \(B\) is much longer and then the decoding instructions (of fixed length) do not affect the compression ratio.