Visual illusions: An Empirical Explanation
Dale Purves et al. (2008), Scholarpedia, 3(6):3706.
Definition of terms
The first term that needs to be defined in discussing visual illusions is 'percept.' The simplest definition of visual percepts is "visual experience" which works well in ordinary discourse, vision science, and thinking about visual illusions. Of course many visual stimuli result in appropriate behavioral responses even when we are not aware of having seen them; indeed the majority of visual processing and visually guided behavior falls in this category (think of all the visual information processed and responded unconsciously to keep your car on the road when driving to work while preoccupied with other thoughts). Perception is thus used here in the conventional sense of our visual experience or phenomenology.
The second term that must be defined is 'illusion.' The usual concept of an illusion is a percept that fails to agree with the real world measurements made with devices such as photometers, spectrophotometers, rulers, protractors, and so on. Although this definition is a good start, the evidence discussed here leads to the counterintuitive conclusion that all visual percepts are illusory in this sense, and that the textbook illusions that people have made so much of are only the more obvious examples of the normal discrepancy between stimuli and percepts (the more flagrant discrepancies arising for the reasons explained below).
Approaches to understanding visual illusions
Most modern investigators have sought to explain vision, and by the same token visual illusions, in terms of the response properties of neurons in the primary and higher order visual cortices. The general idea is that the responses of neurons encode the biologically useful features of light stimuli that fall on the retina, and ultimately generate percepts that correspond to the physical properties of objects in the real world. In this conception, illusions arise because neurobiological constraints do not always allow this goal to be met.
A second approach to understanding visual percepts is predicated on the need to respond successfully to the statistical characteristics of natural scenes. Many aspects of the neuronal responses to visual stimuli are attuned to aspects of natural images that occur with high regularity, and are thus most likely to be useful guides to behavior (e.g., higher spatial frequencies [edges], orientations in the cardinal axes, middle wavelengths, slower speeds, and so on) (Atick and Redlich, 1992; Field, 1994). Consistent with this idea, some neuronal cell receptive fields in the primary visual cortex (V1) look very much like the filters used to produce the relevant basis functions of images, and the organization of these fields can be predicted from images of natural scenes (Atick and Redlich, 1992; Field, 1994; Dong and Atick, 1995; Bell and Sejnowski, 1997; Simoncelli and Olshausen, 1997). Again, illusions would arise because of imperfect neuronal operation in this filtering framework.
A third approach has focused on solving the problems presented for vision in terms of the limitations inherent in the processing of complex information. Thus, noise in visual processing arises from both the complexity of natural images and the neuronal mechanisms by which image information is processed. This approach often uses Bayesian decision theory as both a tool and a conceptual framework to address visual experience and illusion (Knill and Richards, 1996; Rao and Olshausen et al., 2002; Weiss, Simoncelli and Adelson, 2002; Stocker and Simoncelli, 2006; Doya, Pouget and Rao, 2007; Chater, Tenenbaum and Yuille, 2007).
Yet another approach is predicated squarely on the idea that the inverse optics problem is the central challenge in vision, seeing this obstacle as the major force that has determined the nature of visual percepts (and thus textbook illusions) over the course of evolution (Purves and Lotto, 2003; Howe and Purves, 2005a; Howe, Lotto, and Purves, 2006). In this conception, percepts are determined empirically according to the image-source relationships that humans have been exposed to over accumulated experience. Precedents for this latter approach are evident in Helmholtz’s concept of unconscious inference (Helmholtz, 1924), the "organizational principles" of Gestalt psychologists, and in the empirical explanation of some illusions proposed by modern psychologists such as Richard Gregory and others who have interpreted illusions in terms of what abstract visual stimuli represent in natural scenes (Gregory, 1966/1967).
Explaining illusions in empirical terms
Although each of these four approaches has sometimes been used to rationalize particular visual illusions, only the empirical approach to visual perception has sought to explain the full spectrum of these phenomena in a single theoretical and experimental framework. The hypothesis is that all visual percepts are generated empirically to facilitate successful behavior, and were never intended to correspond to the physical properties of the world or our measurements of these properties. From this perspective, illusions do not reflect any inadequacy or imperfection of visual function, but are rather signatures of its core strategy. The experimental approach to validating this hypothesis is to use natural image databases as proxies for accumulated human experience with some aspect of the visual world. If this theory is correct, then the visual percepts elicited by any given stimulus should be predictable on the basis of such data, classical illusions being predicted as well as perceptions that have not been so categorized because they are less markedly discrepant with some physical measurement.
The following example of the perception of a line indicates how this wholly empirical approach has been applied, the predictions it makes, and the challenging perceptual observations it can make sense of. As described in introductory psychology texts, the way we see the length of lines (or spatial intervals generally) is peculiar in that lines of the same physical length appear differently long in different presentations (Fig. 2).

Consider, for instance, a projected line 7 pixels in length oriented at 20° (this length corresponds to ~1° of visual angle, a length often used in psychophysical studies). The cumulative distribution of the sources of lines oriented at 20° gives a cumulative probability value of 0.1494 for a line of this length. Thus 14.94% of the physical sources of lines oriented at 20° generated projections equal to or less than 7 pixels in length, and 85.06% generated longer lines. Accordingly, the empirical rank of a line of this projected length oriented at 20° is 14.94%. The empirical ranks of lines 7 pixels in length at different orientations ranging from 0-180° can be similarly determined from the relevant cumulative probability distribution.
In Figure 6 these data are compiled to show how these rankings vary as a function of line orientation for a projected line that subtends a particular distance on the retina. This systematic variation predicts how the perceived length of a line is expected to change as a function of orientation. Comparison of the function in Figure 6 with the psychophysical function in Figure 3B shows how well these predictions can explain this otherwise puzzling aspect of what we see.
The other line length effects illustrated in Figure 2 can also be explained in this way (Howe and Purves, 2005a). By the same token, equally odd and subtle perceptual phenomena in brightness (Yang and Purves, 2004), color (Long, Yang and Purves, 2006) and motion (Wojtach et al., 2008; Sung, Wojtach, and Purves, 2009) are predicted by similar analyses of image databases that, to a first approximation, represent the sum of human experience with luminance, spectral distribution, or image sequences, respectively. For example, in accord with the static geometric stimuli discussed above, well-known motion “illusions” such as the flash-lag effect (Hazelhoff & Wiersma, 1924; MacKay, 1958; Nijhawan, 1994; Whitney & Murakami, 1998; Berry, et al., 1999; Eagleman & Sejnowski, 2000; Kerzel & Gegenfurtner, 2003; Jancke, et al. 2004a; Jancke, et al., 2004b) and the aperture problem (Wallach, 1935; Hildreth, 1984; Nakayama & Silverman, 1988; Anderson & Sinha, 1997), as well as the perception of speed and direction more generally, can be explained in terms of the accumulated influence of images and their relationships to moving objects in the environment (Wojtach et al, 2008; Sung, Wojtach, and Purves, 2009; Wojtach, Sung, and Purves, 2009). Since the registration of movement is essential to successful behavior, the ability to provide an empirical basis for these motion perceptions grants further support for this approach to vision.
This empirical way of rationalizing visual illusions has some general implications that are worth noting. The first of these is a new way of thinking about the way vision and visual systems operate. In this framework, patterns of light falling on the retina generate percepts by activating visual circuitry that has, over the course of evolution, come to be associated with successful behavior in response to that pattern in the past. This link between images, percepts and physical reality, however, cannot have been created by associating features in the retinal image with features of the world: the inverse optics problem precludes any scheme of visual processing that could arguably relate the characteristics of a retinal image directly to its generative sources. Perceived qualities such as form, brightness, color, and motion thus have no logical meaning in the physical world, although the physical correlates of these qualities can of course be measured with instruments and studied in physical terms.
Nevertheless, by evolving circuitry that links retinal stimuli to behavior according to operational success, the challenge of the inverse problem can be met, with the degree of success depending on the amount of accumulated species (and individual) experience. In this conception, the percepts of an observer are not and cannot be representations of the scene at hand; on the contrary, they are reflexively elicited constructs determined by accumulated species experience over the eons with successful responses to the stimulus in question that are not really representations at all, at least in the usual sense of the word.
Initially at least, the concept of visual percepts as reflex responses based on circuitry that encodes and represents empirical success in response to an image rather than the present relationship between an image and its likely source seems highly counterintuitive. For one thing, this idea defies an overwhelming sense that we see the world as it really is and that we respond accordingly; in contrast, the evidence suggests that the perception of any scene is an operational construct based on millions of years of species experience that (because of the inverse problem) bears no direct relation to real-world objects. In an important sense, then, our experience of the world is always located behind a veil of appearances; visual percepts correlate with reality only because of a sufficient accumulation of empirical information. Secondly, this way of conceiving visual physiology turns the conventional wisdom about the nature of visual neuronal properties on its head. Rather than extracting features from images and passing them on to "higher-order" visual areas where they are combined again as percepts, this alternative framework implies that neurons in the primary and extrastriate visual pathways operate by having incorporated the accumulated empirical information about image-source relationships.
The key to discovering the neural correlates of visual illusions would be to consider how large populations of neurons could process stimuli empirically. Thus, in the case of the apparent length of a line offered above, it would be incorrect to approach the problem in terms of how features of projected line length are encoded and passed on to higher levels in the system. Rather, cortical activity would need to be understood as encoding the conjoint probability distribution of all the possible sources of the retinal stimulus. The fact that the perceptual variation of line length as a function of orientation (as well as other so-called illusions of brightness, color, geometric form, and motion) is so well predicted by the statistical link between images and their generative sources offers strong support for this kind of approach.
A few neurobiological clues to date suggest that biological visual circuitry is indeed subserving empirical demands. For example, enhanced responses to contrast boundaries (Hubel and Wiesel, 1962) as well as color-opponency responses (Hubel and Wiesel, 1968) are correlated with the basis functions of efficient statistical representations of natural images (Olshausen and Field, 1996; Lee, Wachtler, et al, 2002; Wachtler, Lee and Sejnowski, 2001; Caywood, Willmore et al, 2004). Moreover, some anatomical characteristics of the primary visual cortex, e.g. preferential horizontal connections between neurons tuned to similar orientations (Bosking, Zhang, et al, 1997) are also consistent with the incorporation in visual processing of natural image-source statistics (Geisler, Perry, et al, 2001; Goldberg, 1989). Such observations indicate that some physiological links to understanding perception in empirical terms are already known. Presumably the reliance on visual illusions to unravel the complex functional architecture of the visual system will be as great a boon as it has been to understand the basis of perception generally.
In summary, if the visual brain has evolved by associating images with behavioral responses appropriate to the nature of the environment, then understanding visual circuits in terms of image properties alone will be insufficient to explain perception. The underlying neural networks will not be describable without reference to the possible generative sources of the projected images experienced by an agent—animal or artificial—as it navigates the environment, and how these images are related to successful behavior.
The challenge of the inverse problem implies that biological visual systems must take advantage of the empirical links between the inherently ambiguous images and their possible generative sources in the real world. A rapidly growing body of evidence suggests that this information gradually accumulates in the structure and function of visual system circuitry as a result of the benefits of successful visually guided behavior. In this conception of vision, percepts—including all the classical visual illusions—are reflexive responses to patterns of light on the retina activate circuitry that, over the course of evolution, have come to represent biologically useful constructs. Experimental support for this way of understanding vision is its ability to predict anomalous percepts of brightness, color, form, and motion that have been difficult to explain in any other way. If this concept of vision is correct, then the detailed structure and function of visual system circuitry gleaned over the last half century will need to be rationalized in terms of this empirical framework. A further corollary of this way of conceiving vision, as indicated, is that all visual percepts are equally illusory; what are commonly called illusions are simply instances in which the differences between what one sees and measured reality are especially obvious.
