Entropy/entropy example 3
Consider the unit square representing the space \(\Omega\), where the probability is the Lebesgue measure (i.e., the surface area), and the partition \(\mathcal A\) into four sets \(A_i\) of probabilities \(\frac18, \frac14, \frac18, \frac12\), respectively, as shown in Figure <ref>F4</ref>.
The information function equals \(-\log_2\left(\frac 18\right) = 3\) on \(A_1\) and \(A_3\), \(-\log_2\left(\frac 14\right) = 2\) on \(A_2\) and \(-\log_2\left(\frac 12\right) = 1\) on \(A_4\). The entropy of \(\mathcal A\) equals
\[ H(\mathcal A) = \frac18\cdot3 + \frac14\cdot2 + \frac18\cdot3 + \frac12\cdot1 = \frac74. \]
The arrangement of questions that optimizes the expected value of the number of questions asked is the following (see Figure <ref>F5</ref>):
- Question 1. Are you in the left half?
The answer no, locates \(\omega\) in \(A_4\) using one bit. Otherwise the next question is:
- Question 2. Are you in the central square of the left half?
The yes answer locates \(\omega\) in \(A_2\) using two bits. If not, the last question is:
- Question 3. Are you in the top half of the whole square?
Now yes and no locate \(\omega\) in \(A_1\) or \(A_3\), respectively. This takes three bits.
In this example the number of questions equals exactly the information function at every point and the expected number of question equals the entropy \(\frac 74\). There does not exist a better arrangement of questions. Of course such accuracy is possible only when the probabilities of the sets \(A_i\) are powers of \(\frac12\).