A brief look at some basics of visual perception
Importantly, the visual system is complex and processes light infor-mation in such a way that it provides a rich reconstruction of the world.
Human vision has key features such as heterogeneous sensors with its retina coverage being unequal across the visual field it receives light from (Curcio and Allen, 1990). It also has multiple pathways to relay the sensory information to the occipital cortex. Visual information is projected in a retinotopic fashion in the cortex and neural signals encoding it, are processed through layers and fed forward to other areas of the brain (Wandell, 1995; S. Palmer, 1999). For more details, read Appendix A.2.
When we become visually aware of a perceptual object, its information flows in multiple networks ranging from the frontal to the parietal cortex. This process is not instantaneous—and though it may vary depending on the complexity of the stimulus, observer habituation and more factors—, to become quasi-fully conscious1 of a perceived object, the process may take at least 200 milliseconds (ms) (Kornmeier and Bach, 2012). Visual illusion, phenomena in which the visual system’s expectation are tricked into building an incoherent representation of the stimulus, have been used in vision science to identify the processes as inferences, and more specifically as predictive coding (Friston and Kiebel, 2009). Moreover, illusions, such as multi-stable perception, are being used to uncover the neural correlates of consciousness and perception (Frässle et al., 2014).
The visual system is typically considered as a hierarchical structure of networks processing the information flowing in one global direction, that is upwards—namely from the retina to higher cognitive functions in the brain. Vision scientists consider the biological systems’ inputs, the eyes, the lower level of this model, and the consciously experienced qualias—a proposed term to refer to a granular unit of conscious subjective experience, analogous to the quantas in physics (Dennett, 1993; Chalmers, 2007)—, the higher level of the system. The eyes and their movements are key properties of visual perception. Indeed, though our eyes could be considered as poor sensors, from an engineering perspective, we move them to capture information from different areas of the visual field, and to process it efficiently. Eye movements are typically broken down in three categories, when studied in the context of cognitive tasks: saccade, pursuit and fixation (Liversedge et al., 2011). However, they never remain still, and during fixations, micro-saccades, drifts and tremors can be observed (Martinez-Conde, Macknik, and D. Hubel, 2004).
These eye movements create changes of the visual input on the photoreceptor cells of the retina by shifting the retinal projection constantly. But the visual system likely evolved with this constraint, and is thought to exploit these noisy features to increase its capa-bilities (Hicheur et al., 2013; Rucci and Victor, 2015).
A growing view in the field of cognitive sciences requires that the body and action be considered as part of cognition—in embodied or enactive cognition (Varela, 1996a). Linking oculomotor action to visual perception is becoming increasingly evident as eye movements show potential to be physiological markers of internal cognitive states (Kagan and Hafed, 2013; Spering and Carrasco, 2015; Shaikh and Zee, 2018); whether it be attention (Kuhn et al., 2009; Orquin and Loose, 2013; Denison et al., 2019), perception (Gold and Shadlen, 2003; Hafed and Krauzlis, 2006; Schütz, D. I. Braun, and Gegenfurtner, 2011; Boi et al., 2017; Kagan and Burr, 2017), learning and development (Eckstein et al., 2017), language processing (Engelmann et al., 2013), and reading (Kliegl et al., 2004). Though the lower parts of the visual system are starting to be well understood, the higher one goes along the visual hierarchy, the less clear network architectures and causal relationships become.
In fact, the visual system is full of asynchronous feedback mechanisms that make deciphering its workings a very complex task. For instance, parts of the visual signals are fed, as they go through the lateral geniculate nucleus (LGN), to the superior colliculus (SC) which has been correlated to oculomotor programming with other cortical areas such as the frontal eye field (FEF) and the lateral intra-parietal (LIP) cortex (Krauzlis, 2004; Hafed, Goffart, et al., 2009; Taouali et al., 2015; Peel et al., 2016; Krauzlis et al., 2017). But the two later areas are also tightly correlated to attention, a higher cognitive process than oculomotor programming (Astrand et al., 2015). In fact, these two functions may share efference copies, a set of copies of the information for motor programming enabling the system to have different levels of engagement with its action2 (Jeannerod and Arbib, 2003). Note that other theories, such as referent control of perception, may also explain the link between motor action and perception (Feldman, 2016).
These points raise a series of questions. How is action and percep-tion related? What tools can be used to simplify and understand such complex and intertwined interactions in between the motor and perceptual systems? How do these systems relate to conscious experience of the world?
These questions linking the body’s actions to its internal cognitive states may give insights and leads for some of the problems introduced in Appendix A.1 on the origins of consciousness, perceptual experience and its evolution. Our attempt to contribute relies on a trans-disciplinary approach, using a combination of methods from empirical sciences—namely psychophysics and neurosciences—and theoretical research—namely signal processing and computational modelling. In this work, we focused on a visual phenomenon in particular, multi-stable perception, as it allows to study changes of internal perceptual states while the stimulation remains stationary from a physical perspective, but is changed by the eyes’ constant movements. More-over, multi-stability occurs in different modalities (Schwartz et al., 2012) and relates to the coordination of sub-systems in complex systems dynamics (Kelso, 2012). Therefore, theoretical approaches to bi-stability might give key insights relating the questions asked here (Moreno-Bote, Knill, et al., 2011).
The eyes move; in other words they are dynamic and active. Their motion has one major consequence: it shifts the visual content of the retinal projection, and therefore the visual input flow changes. The eyes are located in spherical sockets that allow them to rotate on themselves. The movements are controlled by six strong and precise extraocular muscles. In fact, the oculomotor system, that controls the extraocular muscles enabling eye movements, can be very dynamic and shows various behaviours ranging from stationary fixations to ballistic and highly dynamic saccades.
The oculomotor system’s main functions allows the visual system to fixate a point in space in order to accumulate information, or to track a target by keeping it in the foveal location on the retina as it moves across the visual field.
Oculomotor dynamics can vary extensively depending on the tasks and actions of the observer as shown in Fig. 1.1 (Yarbus, 1967). The eye movements are captured in a bi-variate signal called gaze that situates the foveal position on the visual field over time. It is characterised by fixations during which the gaze is stable and the retinal image motion is small, and punctuated by saccades, a class of rapid and ballistic movements for static stimulation—i.e., a still image. When stimulation is dynamic—i.e., a video display—the oculomotor system produces fixations and saccades, but also in some cases smooth pursuit eye movements. These pursuits are used to track a target object moving across the visual field. Eye movements have been studied in close relation to visual perception and provide key information on the retinal image variations as well as insight on visual attention (Liversedge et al., 2011; Kowler, 2011). They can also be physiological marker of internal cognitive states, and more precisely of motion perception (Just and Carpenter, 1976; Spering, Pomplun, et al., 2011; Shaikh and Zee, 2018; Boccignone, 2019).
Eye & head movements.
Though eye movements are often coupled with head motion in ecological conditions—i.e., experimental conditions that are closer to everyday life, with less control and restrictions—in this work, we do not consider the latter and its interaction because it adds another set of degrees of liberty, thus increasing the complexity of scientific investigation. We focus our review on eye movements that are considered independently from head movements and with experimental setups where these head movements are restrained.
Figure 1.1. Eye movements & instructions. Figure from Yarbus (1967) showing the variation of the spatio-temporal dynamics of gaze for one stimulus. Different tasks were given and are reported bellow the 2D gaze traces.
Saccades: the rapid and ballistic movements
Saccades are fast, ballistic eye movements shifting rapidly the locus of the fovea on the visual field—diagram of a saccade shown on Fig. 1.2. They allow humans to explore, scan and search their environment by displacing the fovea, where the precise conic photoreceptor cells are located, to the area of interest. Saccades are also an energy efficient method to explore a scene (Liversedge et al., 2011) and are often used over other actions for humans, such as head or body movements. They allow the gaze to move from a spatial position to another in a scene. The oculomotor event lasts between 150 ms and 200 ms for planning and execution.
Saccades are characterised in terms of duration, amplitudes and velocities by the main sequence3, a relationship that links the velocity and amplitude of a saccade to the time it takes to plan it (Bahill et al., 1975; Harris and Wolpert, 2006). The ve-locity of the gaze is very high relative to all other eye movements: within 30 ms, the eyes can reach a speed up to 900 visual degrees per second (deg.s-1) (Goldberg et al., 1991). Saccades are typically defined by displacement, velocity and acceleration thresholds—above 0.15 deg, above 30 deg.s-1 and above 9500 deg.s-2, respectively. However other algorithms for detection exist based on adaptive methods and glis-sade4 detection (Nyström and Holmqvist, 2010; Behrens et al., 2010) or Bayesian classification in which an algorithm learns and adapts probability functions related to motion properties of the gaze for saccade detection (Tafaj et al., 2013; Mihali et al., 2017). As you are reading this text, you are in fact doing a series of saccades, moving across words and sentences.
Although we know we can move the eyes, interestingly, our conscious visual flow seems unaffected by the movements. By moving the eye, and thus the retinal image, saccades should generate blurry moments in the visual experience. However, it is not the case as the brain applies mechanisms that guarantee visual constancy (S. Palmer, 1999). This is referred to as saccadic suppression and though it is highly effective in ecological conditions, some experiments have shown that transsaccadic perception can occur (Burr and Ross, 1982; Castet and G. S. Masson, 2000). Though saccades are generally direct movements, there are experimental paradigms and associated phenomena5 that show that it is not always so.
Smooth eye pursuits: the target tracking movements
Smooth pursuits are slower eye movements that have been studied in the context of visual tracking of an object. Therefore, the function of pursuit is to maintain the tracked target on the fovea by matching the spatio-temporal properties of the target’s displacement with the eyes. As a consequence, pursuits are defined as an oculomotor phenomenon with two phases: (1) a catch-up saccade followed by (2) a target pursuit or maintenance phase (Lisberger et al., 1987). Unlike saccades, pursuits are considered as smooth as they do not show high acceleration and jerky movements in the maintenance phase. If the tracked target has erratic motion, the oculomotor system will not track it as it becomes unpredictable and saccadic. Therefore, pursuits are slow oculomotor behaviours, comparatively, with velocities being restrained to a range of 20 to 90 deg.s-1 (Komogortsev and Karpov, 2013; Krauzlis, 2004) and latencies dependent on the catch-up saccade properties.
Moreover, smooth pursuit movements are dependent on visual stimulation as they attempt to fixate a moving target on the fovea by moving the eyes (Rashbass, 1961; Do Robinson, 1965; Liversedge et al., 2011). Thus, they also require constant visual feedback so that gaze can be adjusted and its position or velocity updated. Though pursuits are mostly studied with a clear and explicit target, research has shown that the phenomenon can be applied to more stimuli: random-dot kinematogram (RDK)6 (Heinen and Watamaniuk, 1998), illusory perceptual motion (Madelain and Krauzlis, 2003) or even motion after-effect (MAE) motion (D. Braun et al., 2006). Since smooth pursuits have mostly been studied in explicit dot tracking experiments, this has constrained the development of explicit measurement and detection of the oculomotor event.
The functional role of the pursuit as an oculomotor process is to maintain a target of interest on the high acuity foveal region of the retina (Spering and Montagnini, 2011).
Interestingly, its properties are linked to its definition’s two phases of initiation and maintenance. For instance, detection of pursuits is based on the measurement of particular properties for the initiation phase (catch-up saccade); by looking at latencies between 80 to 120 ms (Krauzlis, 2004; Carl and Gellman, 1987) and retinal positioning at the centre of the fovea. Therefore, this phase has a temporal constraint that depends on saccade properties; a ballistic motion of gaze with high velocity—linked to amplitude by the main sequence relationship from (Bahill et al., 1975)—and the retinal position’s change of location for the region of interest in the stimulus.
For the maintenance phase, measures of gaze and retinal errors and retinal slip7 are used to verify that position, and velocity, of the gaze and target are matched, respectively (more details on pursuit measurement in the box below). Human observers typically track targets up to a speed of 100 deg.s-1 (Spering and Montagnini, 2011), though pursuits are mostly considered to be smooth and precise at speeds inferior to 30 deg.s-1. It is noteworthy that the upper range leads to corruption of the pursuit epochs with catch-up saccades when velocity of target is high (De Brouwer et al., 2002). The maintenance phase, in which retinal image is stabilised, is interpreted to rely on a feedback loop where the oculomotor system must estimate and correct a velocity matching error between gaze and target.
Measuring the quality of tracking has been done by computing gain as a result of modelling the smooth pursuit system as closed-loop system (St-Cyr and Fender, 1969). This measure is effective in the experimental protocols in which a target appears on screen and participants are tasked to follow its motion. Pursuit is mostly studied for tracking a single point on a uniform background, however, other stimuli in motion can lead to pursuit movements (Hey-wood and Churcher, 1971; Heywood and Churcher, 1972). These other stimuli can lead to pursuit phenomenon in conditions—i.e., RDK (Heinen and Watamaniuk, 1998), line figures (G. Masson and Leland Stone, 2002), illusory perceptual motion (Madelain and Krauzlis, 2003) or MAE (D. Braun et al., 2006)—that are less coherent with the two phases structure described in the previous paragraphs, making it harder to detect them with these measures. The measure of gain and the models associated have been questioned for tasks where a percept is pursued, rather than a dot (Leland S Stone et al., 2000).
Fixations: the stationary state of visual accumulation
When the eyes are not moving—in between saccades and pursuits—they are sta-bilised in fixations. A period or epoch of the gaze signal is classified as a fixation if it cannot be classified in a type of movement and when the amplitude of displacement is smaller than 1 deg (Martinez-Conde, Macknik, and D. Hubel, 2004). However, the eyes never stay still. The study of fixations and fixational eye movements (FEM) have grown in the recent decades as increasingly affordable measurement equipment have facilitated this growth (Rolfs, 2009). During a fixation, information is accu-mulated for the visual system as the region of observation is treated by the highly sensitive and precise foveal region of the retina. Hence, during a scene exploration or search task, human observers tend to scan the visual field with saccade-fixation combinations also known as scanpaths (Noton and Stark, 1971a; Noton and Stark, 1971b). They are visible in Fig. 1.1 and characterise the spatio-temporal properties of an observer’s oculomotor behaviour when facing a given task. In most of these tasks, the fixations tend to last on average 300 ms, though they may be much longer in other tasks.
FEM have different dynamics and are classified as micro-saccades, drifts or tremors (Martinez-Conde, Macknik, and D. Hubel, 2004). The dichotomy sep-arating FEM from larger, macro, eye movements can possibly be explained by methodological constraints related to task choice, measurement equipment (Ap-pendix A.3), analyses and classification. A possible explanation is that the reported small amplitude eye movements are miniature versions of the more studied smooth pursuits and saccades, and thus, they may have the same functional role to cognition and vision. In fact, theories that link FEM to active vision have been developed, in which the visual system uses the noisy properties of the FEM to enhance its capabilities and enable the detection of subtle orientation changes (Hicheur et al., 2013) or reach hyper-acuity (Poletti, Listorti, et al., 2013; Rucci, Iovin, et al., 2007)— i.e., human vision shows capacities to detect changes at smaller resolutions than their cone mosaic should allow (Appendix A.2), if no signal processing was carried out by the brain in higher parts of the visual system. However, the identification and classification of FEM, and more largely eye movements, are still debated and unsettled (Rolfs, 2009; Hessels et al., 2018).
Micro-saccades & small amplitude saccades
Micro-saccades have varying definitions and the algorithms used to detect them have changed over the years. Given that the majority of algorithms are based on thresholds, either on speed or acceleration, only threshold based algorithms will be discussed in this paper—one can refer to Hoppe and Bulling (2016) for alternative approaches. Thresholds used in these algorithms are not absolute (as used for saccade detection and definition): velocity thresholds are defined with respect to the median velocity for every trial (Poletti and Rucci, 2016; Krauzlis et al., 2017), or by absolute deviation of the velocity distribution within the fixation combined with a binocularity criterion (Engbert and Kliegl, 2003), or even a Bayesian classifier with priors on velocity and magnitude (Mihali et al., 2017).
Once micro-saccades are detected and classified as events, it is possible to extract oculomotor drift as the complementary epochs in the signal. It is also worth noting that, though the function of micro-saccades has been, in the past, the subject of controversy, the literature now agrees that they are a small amplitude eye movement strategy used for visual exploration and acuity (Martinez-Conde, Macknik, and D. Hubel, 2004; Engbert and Kliegl, 2003; Rolfs, 2009; Kowler, 2011; Hicheur et al., 2013; Poletti, Listorti, et al., 2013). Therefore, they can be considered to have a dependency on the visual signals. And if they can be used as an exploration strategy, they might have some level of volition involved in the process. The use of volition as a criterion for oculomotor event definition and classification is unreliable, as it has been shown that dissociating voluntary oculomotor control and awareness is not straightforward (Poletti and Rucci, 2016). For instance, saccades are often produced without explicit awareness though they serve a voluntary task to find visual information. Recent research results and reviews tend to minimise the debated multiple roles of micro-saccades, and interpret them as small amplitude saccades (Poletti and Rucci, 2016; Sinn and Engbert, 2016). Therefore, in this consideration of eye movements, micro-saccades help readjust the preferred foveal area against the stimulus’ area of interest, hence having a similar functional role as saccades.
Table of contents :
1 Ambiguity for the human visual system
1.1 Visual perception
1.2 Vision & ambiguity: how does the brain handle it?
1.3 Why do we study multi-stable perception?
1.4 State of the art synthesis
2 Micro-pursuits: a class of fixational eye movements
2.3 Main Experiment: Necker cube
2.4 Replication Experiments: Square & Cross
2.5 Comparing Necker, Cross and Square experiments—Corrected inAppendix C
2.6 Discussion—Corrected in Appendix C
3 Modelling eye movements & multi-stable perception
3.1 Gravitational fixational eye movements
3.2 Multi-stable perception
4 Multi-stability: manipulating perceptual ambiguity
4.2 Percepts experiment: identifying the motion percepts
4.3 Ambiguity experiment: percept probabilities w.r.t. transparency
5 Multi-stability as a probe of synergy between action and perception?
5.1 Synthesis of contributions
5.2 Influencing gaze control with random dot kinematograms
5.3 Eye movements as objective markers in ambiguous perception
5.4 What does stability mean for perception?
A Complementary information on the literature review
A.1 Theoretical context
A.2 From the eyes to the brain
A.3 Tracking the eyes
A.4 Multi-stable perception detailed description
A.5 Multi-stability & neurosciences
A.6 Eye movements & the plaid
A.7 Can we remove subjective reports on the moving plaid?
A.8 Gaze-EEG experimental design
B Experimental metrics, modules and designs
B.1 Maximally Projected Correlation
B.2 Eye Movements experiment
B.3 Noisy Motor Events experiment
C Journal of Vision article