# Entropy/connections between different meanings of entropy/entropy example 4

To understand the formula

<math approx>

R(B) \approx \frac{\frac 1nH_{\mu_B}(\Lambda^n)}{\log_2(\#\Lambda)}, [/itex]

consider a lossless compression algorithm applied to a message $$B$$ of a finite length $$N$$. To save on the output length, the decoding instructions must be relatively short compared to $$N$$. This is easily achieved in codes which refer to relatively short components of the messages only. For example, the decoding instructions may consist of a complete list of the subwords $$A$$ of some carefully chosen length $$n$$ appearing in $$B$$ and their images $$\Phi(A)$$. The images may have different lengths (as short as possible). The image of $$B$$ is obtained by cutting $$B$$ into subwords $$B=A_1A_2\dots A_k$$ of length $$n$$ and concatenating the images of these subwords$\Phi(B)=\Phi(A_1)\Phi(A_2)\cdots\Phi(A_k)$. (There are additional issues here: in order for such a code to be invertible, the images $$\Phi(A)$$ must form a prefix-free family, so that there is always a unique way of partitioning $$\Phi(B)$$ back into the images $$\Phi(A_i)$$. But this does not essentially affect the computations.) For best compression results, it is reasonable to assign the shortest images to the subwords appearing in $$B$$ with highest frequencies. It can be proved, with the help of the Shannon-McMillan-Breiman theorem, that such a code realizes the compression rate as in (<ref>approx</ref>) for nearly all sufficiently long messages $$B$$.

For example, consider the binary message of length 30:

$B \ = \ 011001111001001111010001111110 \ = \ 011,001,111,001,001,111,010,001,111,110.$

On the right, $$B$$ is shown divided into subwords of length 3. The frequencies of the subwords in this division are:

$\begin{matrix} 000 - 0 & 001 - 0.4 & 010 - 0.1 & 011 - 0.1 \\ 100 - 0 & 101 - 0\ \ \ & 110 - 0.1 & 111 - 0.3 \end{matrix}$

The theoretical value (obtained using the formula (<ref>approx</ref>) for $$n=3$$) of the best compression ratio is

$-0.4\log_2(0,4)-0.3\log_2(0,3)-3\cdot 0.1\log_2(0.1)\approx 68,2\%.$

A binary prefix-free code giving the shortest images to the most frequent subwords is

$\begin{matrix} 001 &\mapsto& 0, \\ 111 &\mapsto& 10, \\ 010 &\mapsto& 110, \\ 011 &\mapsto& 1110, \\ 110 &\mapsto& 1111. \end{matrix}$

The image of $$B$$ by this code is:

$\Phi(B)= 1110,0,10,0,0,10,110,0,10,1111 = 111001000101100101111$

The compression rate achieved on $$B$$ using this code equals $$21/30 = 70\%$$, which means that the code is quite efficient on $$B$$.

Remark: The image given here does not include the decoding instructions, which comprise simply a record of the above prefix-free code. One has to imagine that $$B$$ is much longer and then the decoding instructions (of fixed length) do not affect the compression ratio.