An old joke (told by vision researchers) goes: Graphics is easy because it is like multiplication and Vision is hard because it is like factoring. Unlike Graphics, Vision is an inverse problem. Given two numbers, it is trivial to multiply them, but they can be factored an infinite number of ways. A unique factorization is possible only given an additional constraint, e.g., the numbers should be as similar as possible in value, in which case the solution is the square root. This is termed a regularizing assumption. Collectively, these regularizing assumptions represent the visual system's a priori knowledge of the world we live in. Vision is impossible without them, yet many common optical illusions are based on a clever artist's malicious violation of the visual system's assumptions.
The pattern of radiance which falls on the retina is not constant in time. It varies due to: 1) motion of objects in the environment; 2) motion of the eye; 3) motion of the body; 4) changing irradiance; and 5) changing reflectance. One of the ill-posed problems solved by the visual system is inventing an explanation for the time variation of image radiance. Let's ignore causes 4 and 5, which might correspond to the sun moving behind a cloud or a chameleon changing color, respectively. The combined effect of causes 1, 2, and 3 is termed optical flow. It can be represented through an optical flow field. In an optical flow field, a velocity vector is associated with every point in the visual field which indicates the speed and direction in which that point of the visual field is moving. For the topic at hand, we are principally interested in how the visual system distinguishes the component of the optical flow due to 1 (environmental motion) from the components due to 2 and 3 (ego motion).
Our eyes are continually in motion. These movements are both voluntary and involuntary. One kind of voluntary movement is known as a saccade. This is a rapid movement involving acceleration and deceleration of the eye. It represents a change in the point to which we are attending. Thankfully, during a saccade, vision is supressed. It is for this reason that we don't actually perceive the world violently rotating around us every time we re-direct our attention.
Another kind of voluntary eye movement is termed smooth pursuit, e.g., the movement of the eye which occurs during reading. Smooth pursuit movements are essential in behaviors like catching a baseball or (in earlier times) a prey animal.
In addition to voluntary movements, the eye also engages in various involuntary movements. The best known of these are vergence movements, which change the distance at which the optical axes of the two eyes converge. Vergence movements play an essential role in the solution of the correspondence problem in binocular stereopsis.
However, most significantly (with respect to the topic at hand), rapid involuntary eye movements are also employed as part of the active control process necessary to keep the gaze fixed on a single point, i.e., between saccades. Without such active control, the eyes would drift.
The visual system usually does a good job in estimating the component of the optical flow due to these involuntary eye movements. Because of this, when we look at a still photograph, we do not perceive it to be seething with motion, even though these involuntary eye movements cause the image falling on the retina to be far from static. It is useful to to draw an analogy between this ability of the visual system and the mechanical image stabilization systems which are employed in the gun sights of tanks and (more recently) the electronic image stabilization systems contained in some home video cameras.
Vision researchers (and artists such as Bridget Riley) have understood for some time that the visual system occasionally makes errors when estimating the direction and magnitude of involuntary eye movements and that these errors can be interpreted as anomalous environmental motion. The cause of these errors is a fundamental ambiguity in estimating the tangential component of the motion of a straight edge. This ambiguity is well known to vision researchers, who refer to it as the aperature problem. Basically, using spatio-temporal filters with compact spatial support, i.e., like the receptive fields of neurons in visual cortical area MT, only the component of the optical flow in the direction perpendicular to a straight edge can be reliably estimated. Solving for the tangential components of the optical flow given the normal components is yet another ill-posed problem. It is theorized that the visual system computes the smoothest two-dimensional optical flow field consistent with the given normal velocity components by a spatial integration process. Smoothness is the "regularizing" assumption.
When mistakes are made in estimating the tangential component of the optical flow, one possible result is that radiance changes due to involuntary eye movements may be misinterpreted as environmental motion. Until this morning, the most dramatic example of this phenomenon was an optical illusion due to the Japanese artist, Hajime Ouchi.
In the Ouchi illusion, when one fixates on the circle in the foreground, any error in the estimate of the vertical component of the optical flow due to involuntary eye movements (or head motion) will be interpreted as anomalous vertical motion of the background. Conversely, when fixating on the background, any error in the estimate of the horizontal component of the optical flow due to involuntary eye movements will be interpreted as anomalous horizontal motion of the foreground.
After a little web search, I discovered that the illusion which everyone is so excited about is by Akiyoshi Kitaoka, of the Dept. of Psychology at Ritsumeikan University in Kyoto, Japan. I think that Kitaoka's illusion is a significant improvement on the Ouchi illusion in several ways. To begin with, the self-similarity of the rings in the figure (under rotation) creates a plethora of correspondence ambiguities. Stated differently, the self-similarity causes aliasing in the response of spatio-temporal filters tuned to velocities tangent to the circles at positions tangent to the circles. The result is that the visual system cannot properly deduce the effect of its own involuntary eye movements in these tangent directions. As in the Ouchi illusion, these errors are interpreted as anomalous tangential motions of the adjacent rings. Since all of these local anomalous motions are tangential, and all are consisent with a small set of uniform smooth motions, namely rotations of the rings in clockwise or counter-clockwise directions, that is what we see.