First step of visual processing
Once the light reaches the eye and passes through the pupil, the cornea and lens focus the light so that the image is formed on the retina, a light-sensitive tissue in the back of the eye [Tessier-Lavigne, 2000]. The neural processing of a visual scene starts here, in a three-layered neural network (Figure 2.1). The conversion of light into an electrical signal is performed by photoreceptors, specifically rods and cones, which respond to light via a graded change in their membrane potential. The visual information is then transmitted through a layer of interneurons, where the graded potentials from photoreceptors are fed to bipolar cells, while being modulated by horizontal cells which connect laterally to rods and cones. The bipolar cells’ outputs are aﬀected by another class of interneurons, amacrine cells, which perform lateral inhibition. Finally, retinal ganglion cells (RGCs) receive the inputs from bipolar cells, and communicate the visual information to the rest of the central nervous system through the optic nerve, comprised of RGC axons. Unlike retinal interneurons, which communicate on smaller scales, the RGCs generate action potentials i.e. spike in order to transmit the signal over long distances.
Apart from rods and cons, there is a third type of photosensitive retinal cell responsive directly to light, namely the intrinsically-photosensitive retinal ganglion cells (ipRGCs) [Morgan and Kamp, 1980, Foster et al., 1991, Hattar et al., 2002]. We will focus on the ‘classical’ retinal ganglion cells for the rest of the section since in this thesis we record and model their activity. Recently, a completely new class of retinal cells was hypothesized to exist, called the Campana cells, however there is still little known about them [Young et al., 2021].
One of the first recorded retinal responses was the one related to luminos-ity changes, when first functional distinction between diﬀerent types of ganglion cells was discovered [Hartline, 1938]. Using an oscillograph to record a single intraocular fiber of bull-frog retina, Hartline was able to classify retinal ganglion cells according to their responses to light pulses. Some cells regularly responded to the increase in the light level (termed ON cells), while other did so for decrease (OFF cells); a combination of these classes, called ON-OFF cell, was activated by both increases and decreases of light intensity.
Moreover, Hartline found that responses can be found in a certain fiber only if a restricted area of the retina is stimulated by light dots, eﬀectively discovering receptive field [Hartline, 1938, Hartline, 1940]. The receptive field is usually constituted of two typically concentric circles, center and surround, which might be of same or opposite polarity (ON/OFF) [Hartline, 1940, Kuﬄer, 1953]. The structure of center and surround is fluid, and cannot always be considered to be a regular shape [Levick, 1967, Liu et al., 2009]. A cell with a RF of opposite polarity of center and surround exhibits the center-surround antagonism. For example, if an RF with ON center and OFF surround is flashed with a bright spot, the excitatory center and inhibitory surround will cancel out. This can be understood intuitively as a way of preventing energy expenditure on parts of visual scenes with uniform luminosity, such as cloudless blue sky or a white wall. On contrary, if the center of RF is stimulated with a bright spot, while the surround is presented with a dark band, the response from the center and surround will be combined into a stronger one. The element of visual scene that corresponds to this situation is a high-contrast, such as an edge.
While retinal cells can be divided into aforementioned 5 broad classes, each class has a high number of anatomically and morphologically distinct cell types: for example, in mammalian retina the lower estimate is around 60 cell types [Masland, 2012]. There is an abundance of cell types even if we focus solely on retina’s output cells, ganglion cells. For instance, by stimulating with white noise and chirp stimulus, it was revealed that the mouse retina has more than 30 functionally distinct types of ganglion cells, which respond in diﬀerent way to the same stimulation pattern [Baden et al., 2016]. Around 17 types were found in the primate retina so far [Grünert and Martin, 2020], although 5 types – ON and OFF midget, ON and OFF parasol, and small bistratified cells – together make up for 75% of all cells [Dacey, 2004b]. The conclusions of studies of the axolotl retina, which is the model animal we primarily discuss here, has been somewhat ambiguous regarding the number of RGC types, but the latest studies estimate presence of 5-7 types [Segev et al., 2006, Marre et al., 2012, Rozenblit and Gollisch, 2020].
A distinct RGC type is frequently found to uniformly cover the visual field in a regular lattice structure, displaying mosaic organisation (again, with a certain degree of uncertainty in case of salamander, since only some of the cell types were found to tile the space without an overlap) [Segev et al., 2006, Marre et al., 2012, Kastner and Baccus, 2011, Kühn and Gollisch, 2016]. This enables the retina to uniformly sample the visual space, creating a ‘sensory map’, where each RGC cell type then extracts a certain low-level feature of the visual scene [da Silveira and Roska, 2011].
Computations in the retina
Until quite recently it was thought that the retina’s role is mainly one of a ‘camera sensor’, adapting to the light intensity and performing spatio-temporal filtering using center-surround antagonism [Meister and Berry, 1999]. This view would assume the visual scene is transmitted to the downstream areas as a matrix of pixels that are sharpened in both space and time. However, such pixel-by-pixel representation seems unlikely given two facts: (i) the number of photoreceptors is 2 orders of magnitude higher than the number of ganglion cells (both in mouse and human, as example), (ii) the diversity of retinal cell types. These imply the information has to be re-packaged to pass this bottleneck in a meaningful way. Additionally, as Gollisch and Meister point out, there is a paradox in assuming simple operations such as adapting to changing light levels and image sharpening would require such a complex network comprised of such a variety of neuron types [Gollisch and Meister, 2010]. One of the possible explanations for the diversity of cell types is that diﬀerent retinal computations require parallel pathways to transmit diﬀerent features of the visual scene [Wässle, 2004, Dacey, 2004a]. In other words, there is a need for diversity of cell types to fulfill various functions the retina performs.
A good example of feature extraction happening as early as the retina are the direction-selective retinal ganglion cells (DS RGCs). When a moving stimuli, such as a grating or bar, passes across its receptive field, these cells fire spikes with clear preference for one direction [Barlow et al., 1964, Demb, 2007]. This illustrates how the information from the visual scene can be compressed already at the first stage of neural processing, with subset of cells – DS RGCs – conveying nothing else apart from the signal about object direction. It also allows for downstream areas to directly read-out said direction by integrating activity of several direction-selective RGCs. This kind of computation is an illustration of explicit coding of a certain property of the visual environment. Furthermore, it is not the only such example: previous studies were able to decode various features of the external stimuli in the activity of ganglion cells, such as contrast [Shapley and Victor, 1978, Smirnakis et al., 1997, Goldin et al., 2021], local and global motion [Oyster, 1968, Ölveczky et al., 2003, Kühn and Gollisch, 2016], texture motion [Enroth-Cugell and Robson, 1966, Kaplan and Shapley, 1986, Petrusca et al., 2007], and approach sensitivity [Münch et al., 2009]. In fact, it is not unusual for multiple features to be encoded by the same cells, such as object motion and direction [Kühn and Gollisch, 2016], or object position and speed [Deny et al., 2017] (for review, see [Gollisch and Meister, 2010]).
The signal transmission through the photoreceptor cascade introduces delays of around 30-100 ms, which might be critical for the flight or fight response [Gollisch and Meister, 2010]. A subset of retinal computations is related to retina’s apparent ability to counter these intrinsic delays by anticipating future stimulus states. If an object is moving over the retina, we could expect the prediction of the object position to lag behind object’s actual position. However, in an experiment with a smoothly moving bar, it was revealed that the peak of RGCs population activity in fact corresponds to the current position of the bar, or even its position slightly in the future [Berry et al., 1999] (Figure 2.2). The retina compensates for the processing delays by extrapolating the upcoming bar position given the regularity of its movement. This finding was surprising given that at the time motion anticipation was proposed to be generated by some higher-level brain area [Nijhawan, 1994, De Valois and De Valois, 1991].
An analogous eﬀect, found in psychophysics, is the flash-lag eﬀect: par-ticipants were shown a bar moving at fixed speed and another bar flashed in continuation of the moving one. Despite the two bars being aligned, participants would report the moving bar being ahead, suggesting another example of motion extrapolation at hand [Nijhawan, 2002]. The results of Berry et al. suggest that spatial anticipation is not unique for the visual cortex, but can also be computed at the first stage of visual processing as well. Furthermore, the computation of flash-lag eﬀect was also more recently associated with known feed-forward retinal mechanisms [Subramaniyan et al., 2018, Nijhawan, 2002, Rust and Palmer, 2021].
Continuing the work on anticipation of bar motion, Schwartz et al. asked the following: if the retina is extrapolating the motion of the bar, what would be the response in the case where the movement is interrupted? In the experiment where a moving bar makes a sudden turn and changes direction, the RGC population will at first continue to signal the position as if the bar didn’t turn, but it will quickly, after several tens of milliseconds, update on the new bar direction and continue with correct predictions of its position [Schwartz et al., 2007b]. In the case of 180 degrees reversal, there is a brief synchronized burst of spikes, possibly signalling an error in anticipated motion to downstream areas. Similarly, a sudden onset of movement elicits a stronger response than smooth motion, possibly because it contradicts the expectation of having a stationary object [Chen et al., 2013].
Omitted stimulus response
We have seen how the retina responds to stimuli with a predictable spatial component. To test whether similar findings stand for temporal patterns, Schwartz and colleagues stimulated the retina with a sequence of periodic flashes [Schwartz et al., 2007a]. They found that once that sequence is abruptly stopped, the RGCs respond strongly (Figure 2.3, e.g. third row). This is yet another nonlinear phenomena: the omitted stimulus response (OSR), also known as the omitted stimulus potential (OSP) [Bullock et al., 1990a]. Similar to motion reversal eﬀect, here the temporal regularity – the periodic nature of the flashes – is violated, causing the RGCs to seemingly signal the deviation from the prediction. Moreover, the timing of the OSR is not constant but instead carries information: it depends on the inter-flash period in the range between 6 and 20 Hz [Schwartz et al., 2007a]. The retina appears to ‘learn‘ the exact interval between two flashes and the latency of the OSR is consistently shifting with it (Figure 2.4). The robustness of OSR was probed by jittering the periods between flashes, changing flashes to diﬀerent shapes, etc, however the response persisted despite the noise [Schwartz and Berry 2nd, 2008]. Possibly the most surprising discovery is the variety of behaviour in the recorded responses, as can be seen from 10 diﬀerent combinations of responses to beginning and ending of the flash sequence (Figure 2.3).
Eﬃcient coding in the retina
As we have seen in Chapter 2, in most of the mammals there is a thousand-fold reduction in number of cells between the photoreceptor layer and the retinal ganglion cells output. This bottleneck makes the retina a good candidate to test the eﬃcient coding theory, since in such conditions it is suggested that the retina would have to compress the incoming information. Moreover, it is possible to record a representative sample the whole retinal output, which makes validating the theoretical predictions with experimental data feasable.
Atick and Redlich were able to predict the variation of receptive field shape depending on the noise conditions by tarting from eﬃcient coding hypothesis as a design principle. In the low-noise setting, the center-surround structure of retinal ganglion cells receptive fields is used to integrate inputs from within their RF while suppressing the stimuli in their immediate surround (Figure 3.2A). This finding is in accordance with Barlow’s original hypothesis, since he assumed a noiseless channel, hence being eﬃcient in this case means the optimal strategy is In contrast, the eﬃcient coding theory makes an opposite prediction when sensory inputs are corrupted by a high level of noise. Here, the optimal code is actually the one in which neurons respond redundantly to their inputs, so as to average out the noise, leading to an increase in signal-to-noise (SNR) ratio. As a result, the eﬃcient coding theory predicts that the neural code should change qualitatively with varying input noise, acting as a whitening filter at low noise, and a smoothing filter at high noise. Interestingly, Atick and Redlich showed that this can explain the observed changes in the RF shape of RGCs with decreasing visual contrast, which become broader and have a weaker suppressive surround at lower contrast levels (Figure 3.2B). However, this should not come as a surprise. The optimality of the code depends on the input: a code which is eﬃcient for a certain input statistics is not necessarily optimal for another [Simoncelli and Olshausen, 2001].
Atick and Redlich made a number of simplifying assumptions about the nature of the neural code, where RGCs are assumed to behave as linear determin-istic filters of their inputs. Since then, a number of authors have investigated what happens in the more general case, where neural responses are noisy and non-linear. It was shown that, with more realistic neural models, eﬃcient coding can account for many qualitative aspects of retinal organisation, such as the ratio between ON and OFF cell types [Karklin and Simoncelli, 2011, Ratliﬀ et al., 2010], the overlap between RFs [Doi and Lewicki, 2007], changes in RFs with varying retinal eccentricity [Doi and Lewicki, 2014, Ocko et al., 2018]. Likewise, starting from eﬃcient coding principles it is possible to explain how having both ON and OFF cell pathways leads to a lower metabolic cost on average [Gjorgjieva et al., 2014]. In recent work, Doi et al. directly compared predictions of eﬃcient coding with simultaneous recordings from cone photoreceptors and RGCs [Doi et al., 2012]. They found that ganglion cells exhibited high (∼ 80%) eﬃciency in transmitting spatial information, relative to their model. Recently, Ocko et al. found that it is possible to start from first principles (statistics of natural movies and realistic energy constraints), and reconstructed the spatial and temporal sensitivity, cell spacing, ratio of cells types, as well as how distribution of cell types changes with eccentricity in primate retina [Ocko et al., 2018].
Table of contents :
1.1 Thesis outline
2 Retinal processing
2.1 First step of visual processing
2.2 Computations in the retina
2.2.1 Retinal anticipation
2.2.2 Omitted stimulus response
2.3 Modelling retinal responses
3 Efficient coding
3.1 Efficient coding in the retina
3.2 Coding for predictions
3.3 Encoding surprise
4 Surprise encoding in the retina
5.1 Surprise-related responses in the sensory cortex
5.2 Future directions
5.3 General relevance
A.1 Repetitions on the mouse retina
A.2 Details on the stimulus design