# Topological entropy

Post-publication activity

Curator: Tomasz Downarowicz Figure 1: Topological entropy generated in a so-called horseshoe: the rectangle is stretched, bent upward and placed over itself. Suppose that only points that are in opposite halves of the rectangle can be distinguished. Initially there are at most two distinguishable points, for example, the blue points. But there exist already four different points whose orbits can be distinguished in two steps: the red points are mapped onto the blue and violet points and any two of them are distinguished either immediately or after applying the transformation once. Similarly, there are eight points (the black points), whose orbits are similarly distinguished in three steps (after one iterate the black points become the red and yellow points, after another iterate they become the blue, violet and green points). The number of orbits distinguishable in $$n$$ steps grows as $$2^n\ ,$$ generating the topological entropy $$(1/n)\log_2(2^n) = 1\ .$$

Let $$(X,T)$$ be a topological dynamical system, i.e., let $$X$$ be a nonempty compact Hausdorff space and $$T:X\to X$$ a continuous map. Topological entropy is a nonnegative number which measures the complexity of the system. Roughly, it measures the exponential growth rate of the number of distinguishable orbits as time advances.

In what follows $$\log$$ denotes $$\log_2$$ (although this choice is arbitrary).

## History

The original definition was introduced by Adler, Konheim and McAndrew in 1965 [AKM]. Their idea to assign a number to an open cover to measure its size was inspired by Kolmogorov and Tihomirov [KT] (1961). Then to define topological entropy for continuous maps they strictly imitated the definition of Kolmogorov-Sinai entropy of a measure preserving transformation in ergodic theory. In metric spaces a different definition was introduced by Bowen in 1971 [B1] and independently Dinaburg in 1970 [D1]. It uses the notion of $$\varepsilon$$-separated points. Equivalence between the above two notions was proved by Bowen [B2] in 1971. The most important characterization of topological entropy in terms of Kolmogorov-Sinai entropy, the so-called variational principle was proved around 1970 by Dinaburg [D1, D2], Goodman [Gm] and Goodwyn [Gw].

## Definitions

### By Adler, Konheim and McAndrew

For an open cover $$\mathcal{U}$$ of $$X$$ (i.e., a family of open sets whose union is $$X$$), let $$N(\mathcal{U})$$ denote the smallest cardinality of a subcover of $$\mathcal{U}$$ (i.e., a subfamily of $$\mathcal{U}$$ whose union still equals $$X$$). By compactness, $$N(\mathcal{U})$$ is always finite. If $$\mathcal{U}$$ and $$\mathcal{V}$$ are open covers of $$X$$ then $\mathcal{U}\vee\mathcal{V} = \{U\cap V:U\in\mathcal{U}, V\in\mathcal{V}\}$ is called their common refinement. Let $$\mathcal{U}^n=\mathcal{U}\vee T^{-1}\mathcal{U}\vee \dots\vee T^{-n+1}\mathcal{U},$$ where $$T^{-k}\mathcal{U}=\{T^{-k}U:U\in\mathcal{U}\}\ .$$ Using a subadditivity argument, one shows that the limit $h(\mathcal{U},T)=\lim_{n\to\infty}\frac{\log N(\mathcal{U}^n)}n$ exists for any open cover $$\mathcal{U}$$ (and equals $$\inf_{n\in\mathbb{N}}\frac{\log N(\mathcal{U}^n)}n$$). The topological entropy of $$(X,T)$$ is defined as the supremum $h_A(T) = \sup h(\mathcal{U},T),$ where the supremum ranges over all open covers $$\mathcal{U}$$ of $$X\ .$$

### By Bowen and Dinaburg

Assume for simplicity that $$X$$ is metric (otherwise a uniform structure can be used). A set $$E \subset X$$ is said to be $$(n,\varepsilon)$$-separated, if for every $$x, y\in E$$ with $$x\neq y$$ there is $$i\in\{0,1,\dots,n-1\}$$ such that $$d(T^ix,T^iy)\ge\varepsilon\ .$$ Let $$s(n,\varepsilon)$$ be the maximal cardinality of an $$(n,\varepsilon)$$-separated set in $$X\ .$$ Again, by compactness, this number is always finite. One defines $\overline h(\varepsilon,T)=\limsup_{n\to\infty}\frac{\log s(n,\varepsilon)}n.$ The topological entropy is obtained as $h_B(T)= \sup_{\varepsilon>0} \overline h(\varepsilon,T) = \lim_{\varepsilon\to 0} \overline h(\varepsilon,T).$

It should be noted that $$s(n,\varepsilon)$$ can be substituted in Bowen's definition with a possibly smaller number $$r(n,\varepsilon)\ ,$$ the minimal cardinality of an $$(n,\varepsilon)$$-spanning set. A set $$E \subset X$$ is $$(n,\varepsilon)$$-spanning if for every $$x\in X$$ there is $$y\in E$$ such that $$d(T^ix,T^iy)<\varepsilon$$ for all $$i\in\{0,1,\dots,n-1\}\ .$$ With such substitution one obtains the same value of $$h_B(T)\ .$$

### Equality between the two notions

It is not hard to see that if $$\mathcal{U}$$ is an open cover with all elements of diameter at most $$\varepsilon$$ and Lebesgue number $$2\delta$$ then $s(n,\varepsilon) \le N(\mathcal{U}^n) \le s(n,\delta),$ which not only implies that $$h_B(T) = h_A(T)$$ but also that the same number $$h_B(T)$$ is obtained if $$\overline h$$ is replaced by $$\underline h$$ defined using $$\liminf$$ in place of $$\limsup\ .$$ From now on we will use $$h(T)$$ to denote either $$h_A(T)$$ or $$h_B(T)\ .$$

## Interpretation

The interpretation of the number $$s(n,\varepsilon)$$ is the following: suppose one observes the system with a device of resolution $$\varepsilon\ ,$$ i.e., two points are distinguished only if the distance between them is at least $$\varepsilon\ .$$ Then, after $$n$$ steps of the evolution of the system, the observer will be able to distinguish at most $$s(n,\varepsilon)$$ different orbits. Thus, the value $$\overline h(\varepsilon,T)$$ is the exponential growth rate of the number of $$\varepsilon$$-distinguishable $$n$$-orbits achieved as $$n$$ grows to infinity. The value $$h(T)$$ maximizes the above over all $$\varepsilon>0\ .$$ This is the precise meaning of saying that topological entropy measures the exponential complexity of the system.

## Basic properties of topological entropy

• $$h(T)\ge h(S)$$ if $$(Y,S)$$ is a topological factor of $$(X,T)\ ,$$ i.e., $$\phi\circ T=S\circ \phi,$$ where $$\phi:X\to Y$$ is a continuous surjection;
• $$h(T)=h(S)$$ if $$(X,T)$$ and $$(Y,S)$$ are topologically conjugate, i.e., $$T=\phi S \phi^{-1}\ ,$$ where $$\phi:X\to Y$$ is a homeomorphism;
• $$h(T)\ge h(T|_Y)$$ if $$Y$$ is a nonempty closed invariant subset of $$X\ ;$$
• $$h(T^n)=nh(T)$$ for $$n\ge 0\ ;$$
• $$h(T^{-1})=h(T)$$ if $$T$$ is a homeomorphism;
• $$h(T\times S)=h(T)+h(S)\ .$$
• $$h(T\circ S)=h(S\circ T)\ .$$

In a language different from that of topological dynamics - namely, that of communication theory - Shannon already obtained these results in 1949 for a concept he called the discrete noiseless channel, see [SW].

## Relation with Kolmogorov-Sinai entropy

The relation between topological entropy and measure theoretic entropy is established by the Variational Principle, which asserts that $h(T)=\sup\{h_\mu(T):\mu\in \mathcal{P}_T(X)\},$ i.e., topological entropy equals the supremum of the Kolmogorov-Sinai entropies $$h_\mu(T)\ ,$$ where $$\mu$$ ranges over all $$T$$-invariant Borel probability measures on $$X\ .$$ In many cases (for instance in asymptotically h-expansive systems), a measure with maximal entropy exists (sometimes unique) and is of a considerable interest. However, in general a measure of maximal entropy need not exist (Gurevich [Gu]). In 1964 Parry [P] gave the formula for the unique Markovian measure of maximal measure entropy for irreducible subshifts of finite type. This formula previously appeared in 1949 in a work by Shannon [SW].

Perhaps the most remarkable comparison of topological entropy to measure theoretic entropy concerns the completeness of the invariants for a standard class of examples. In ergodic theory the standard models are Bernoulli shifts, and for this class Ornstein proved that measure theoretic entropy is a complete conjugacy invariant, the conjugacy being given by a measure preserving transformation (see Katok [Ka] for relevant references). In topological dynamics the standard models are aperiodic shifts of finite type; but, although topological entropy is not a complete invariant under topological conjugacy, in 1979 Adler and Marcus [AM] proved it is under a slightly weaker notion of equivalence called almost topological conjugacy. Their definition of almost topological conjugacy employed measure theoretic concepts to define an exceptional null set where the conjugating map failed to be injective. However a purely topological definition can be found in Adler, Coppersmith, and Hassner [ACH].

For textbooks which treat conjugacy questions in symbolic dynamics related to topological entropy and an extensive source of references see Lind and Marcus [LM] and Kitchens [Ki].

## Topological entropy in some special cases

In some special cases of topological dynamical systems, topological entropy can be computed in a more direct way.

• If $$(X,T)$$ is a subshift (also called a symbolic system), i.e., $$T$$ is the shift map on a closed shift invariant collection $$X$$ of (one-sided or two-sided) sequences of symbols belonging to a finite alphabet, then $$h(T) = \lim_{n\to\infty}\frac{\log \#\mathcal{B}_n}n\ ,$$ where $$\mathcal{B}_n$$ denotes the collection of all words of length $$n$$ appearing anywhere in the sequences $$x\in X\ .$$
• For subshifts of finite type topological entropy coincides with the logarithm of the spectral radius of the transition matrix.
• For piecewise monotone interval maps, topological entropy is equal to the limit $$h(T)=\lim_{n\to\infty}\frac{\log c_n}n\ ,$$ where $$c_n$$ denotes the number of pieces of monotonicity of $$T^n\ .$$
• In some "regular" cases, topological entropy can be computed by counting the number of distinct periodic orbits of period $$n$$ (and taking the upper limit after applying logarithm and dividing by $$n$$). See Bowen [B2].

### More interpretation

The first case discussed above (of a symbolic system) admits an interesting interpretation in terms of information content. One can prove the following theorem:

• Let $$(X,T)$$ be a subshift on finitely many symbols, whose entropy is $$h(T)\ .$$ Let $$N$$ be the smallest integer strictly larger than $$2^{h(T)}\ .$$ Then $$(X,T)$$ is conjugate (via a sliding block code) to a subshift $$(Y,S)$$ on $$N$$ symbols, and $$N$$ is the smallest such number (with one exception - when $$(X,T)$$ is conjugate to the full shift on $$N$$ symbols, in which case $$N = 2^{h(T)}$$). See Adler, Coppersmith and Hassner [ACH].

In other words, $$h(T)$$ informs about the minimal number of symbols sufficient to encode the system "in real time" (i.e., without rescaling the time). If the original subshift uses some $$M>N$$ symbols, it can be "compressed" (by reducing the alphabet to $$N$$ symbols) without any loss of information.

Another compression involves uniform time rescaling:

• Let $$(X,T)$$ be a subshift on $$N$$ symbols, whose entropy is $$h(T)\ .$$ Let $$\frac nm$$ be a rational number strictly larger than $$\frac{2^{h(T)}}N\ .$$ Then $$(X,T^m)$$ is conjugate to $$(Y,S^n)\ ,$$ where $$(Y,S)$$ is a subshift over $$N$$ symbols.

Here the compression is obtained by "squeezing" the time axis uniformly at the rate $$\frac nm\ ,$$ while the alphabet remains unchanged. Note that, unless $$(X,T)$$ is the full shift, we have $$h(T)<\log N\ ,$$ so the fraction $$\frac nm$$ may be chosen proper, and the "data transmission" is indeed sped up.

## Related notions

### Topological tail entropy and symbolic extension entropy

• In 1976, Misiurewicz [M] introduced an entropy-related parameter $$h^*(T)\ ,$$ which he called the topological conditional entropy. It equals the infimum over all finite covers $$\mathcal U$$ of the conditional entropy given $$\mathcal U$$ (the definition of this conditional entropy is similar to Bowen's definition of topological entropy, only one counts $$(n,\epsilon)$$-separated points inside an element of the cover $$U^n$$). Because the infimum does not depend on any conditioning parameter (cover) any more, $$h^*(T)$$ is recently called the topological tail entropy. Roughly speaking it measures the amount of entropy that always escapes the measurement using any finite resolution. We have $$h^*(T)\le h(T)\ .$$

The systems for which $$h^*(T)=0$$ are called asymptotically h-expansive and they enjoy special properties. The Kolmogorov-Sinai entropy is an upper-semicontinuous function of the invariant measure, hence the measure of maximal entropy in such systems always exists.

• The problem of lossless encoding of a general topological dynamical system by embedding it (as a factor) in a subshift with a finite number of symbols is very complicated. The parameter which informs about the minimal entropy of the subshift (and hence about the number of symbols needed) is called symbolic extension entropy $$h_{sex}(T)$$ introduced by Boyle and Downarowicz [BD] and it is usually larger than $$h(T)\ ,$$ in some cases it equals $$h(T)+h^*(T)$$ but can be larger, sometimes infinite even when $$h(T)<\infty\ .$$ An infinite value of $$h_{sex}(T)$$ means that the system cannot be embedded in a subshift.

The equality $$h_{sex}(T)= h(T)$$ holds only in special cases, in particular when $$(X,T)$$ is asymptotically h-expansive.

### Mean topological dimension

Lindenstraus and Weiss [L], [LW] introduced a new invariant, proposed by Gromov, called mean topological dimension mdim$$(X,T)\ ,$$ which solved a long-standing imbedding problem of topological dynamics. For a number of dynamical systems this invariant can be calculated. If the topological dimension of $$X$$ is finite, or if $$(X,T)$$ has finite topological entropy then mdim$$(X,T)= 0\ ,$$ so the new invariant is good for distinguishing between systems where the topological dimension of $$X$$is infinite or $$(X,T)$$ has infinite topological entropy.

### Topological entropy and chaos

Topological dynamical systems of positive entropy are often considered topologically chaotic. Positive entropy always implies Li-Yorke chaos defined as the existence of an uncountable scrambled set (see Blanchard et al. [BGKM]). For interval maps the condition $$h(T)>0$$ is equivalent to distributional chaos as defined by Schweizer and Smital in 1994 [SS]. Nonetheless, some systems of zero topological entropy reveal very complicated behavior, in particular, they may be Li-Yorke or distributionally chaotic.

## Generalizations of topological entropy

### Complexity

The parameter which controls the subexponential growth of the number of distinguishable orbits in the zero entropy case is topological complexity (see Blanchard et al. [BHM]).

### Pressure

A notion more general than topological entropy, depending also on a function $$f:X\to\Bbb R\ ,$$ is pressure. Topological entropy is the pressure for $$f\equiv 0\ .$$

### Topological entropy for flows

For a flow $$\varphi:\Bbb R\times X\to X$$ the topological entropy is defined as the entropy of the time-one map$h(\varphi)=h(\varphi^1)\ .$ Since for flows a notion that is used more often than conjugacy is equivalence, which admits rescaling of time, topological entropy basically distinguishes only between flows with zero, positive finite, and infinite entropy.

### Other actions

Topological entropy can be also defined for actions of more general groups, like $$\Bbb Z^d\ ,$$ $$\Bbb R^d\ ;$$ most generally, amenable groups. Variational principle holds also in those cases.

There are several generalizations of topological entropy to the case when the space $$X$$ is not compact (see Bowen, [B3]).

### Topological entropy for nonautonomous dynamical systems

Topological entropy is also defined for nonautonoumous dynamical systems given by a sequence of continuous maps (see Kolyada and Snoha, [KS]).