Functional neuroimaging modalities
Until the advent in the 1920s of non-invasive neuroimaging modalities, most of the accumulated knowledge of the brain came from the study of lesions, post-mortem analysis and invasive experimentations. With the advent of modern, non-invasive imaging techniques, several aspects of the human brain are revealed in vivo with high degree of precision. Several brain imaging techniques are available today. These can be divided into structural or anatomical and functional imaging techniques. While structural imaging provides details on morphology and structure of tissues, functional imaging reveals physiological activities such as changes in metabolism, blood ow, regional chemical composition, and absorption. In this section we will discuss briey the main functional neuroimaging modalities available today.
• Electroencephalography – EEG is a widely used modality for functional brain imaging. EEG measures electrical activity along the scalp.
EEG activity reects the synchronous activity of a population of neurons that have similar spatial orientation. If the cells do not have similar spatial orientation, their ions do not line up and thus do not create detectable waves. Pyramidal neurons of the cortex are thought to produce most of the EEG signals because they are well-aligned and re together.
Because voltage elds fall o with the square of distance, activity from deep sources is more dicult to detect than currents near the skull. Due to the ill-posed problem of volumetric data reconstruction from surface measurements, EEG has a poor spatial resolution compared to other
modalities such as fMRI.
• Stereotactic electroencephalography – sEEG is an invasive version of EEG, based on intra-cranial recording. It measures the electrical currents within some regions of the brain using deeply implanted electrodes, localized with a stereotactic technique. This approach has the good temporal resolution of EEG and enjoys an excellent spatial resolution. However, sEEG is very invasive and is only performed for medical purpose (e.g localization of epilepsy foci) and has a limited coverage (only the regions with electrodes). A close approach is Electrocorticography – ECog – that uses electrodes placed directly on the exposed surface of the brain. Even in this case its usage is restricted to medical purposes.
• Magnetoencephalography -MEGmeasures the magnetic eld induced by neural electrical activity. The synchronized currents in neurons create magnetic elds of a few hundreds of femto Tesla (f T ) that can be detected using specic devices. Although EEG and MEG signals originate from the same neurophysiological processes, there are important dierences. Magnetic elds are less distorted than electric elds by the skull and scalp, which results in a better spatial resolution of the MEG. Whereas EEG is sensitive to both tangential and radial components of a current source in a spherical volume conductor, MEG detects only its tangential components. Because of this EEG can detect activity both in the sulci and at the top of the cortical gyri, whereas MEG is most sensitive to activity originating in sulci. EEG is, therefore, sensitive to activity in more brain areas, but activity that is visible in MEG can be localized with more accuracy. Note that EEG and MEG can be measured simultaneously.
• Positron emission tomography – PET is an imaging modality based on the detection of a radioactive tracers introduced in the body of the subject. The tracers (or radionuclide decay) emit a positron which can in turn emit, after recombination with an electron, a pair of photons that are detected simultaneously. PET therefore provides a quantitative measurement of the physiological activity. It can also be used for functional imaging, by choosing a specic tracer. In particular, the uorodeoxyglucose (or FDG), is used for imaging the metabolic activity of a tissue. This is based on the assumption that areas of high radioactivity are associated with brain activity. PET has two major limitations: the tracers required for PET are produced by cyclotrons (a type of particle accelerator), which implies an heavy logistic. Furthermore, the use of radio-tracers is not harmless for the health of the subjects so PET is now used for medical purpose only.
Functional MRI and BOLD signal
The primary form of fMRI measures the oxygen change in blood ow. This is known as the Blood-oxygen-level dependent (BOLD) contrast. Other increasingly popular functional MRI method is arterial spin labeling (ASL) [Detre et al., 1994, Alsop and Detre, 1998, Williams et al., 1992], which uses arterial water as tracer to measure cerebral blood ow. Compared to fMRI, ASL has a lower signal to noise ratio [Detre and Wang, 2002]. However, ASL provides reliable absolute quantication of cerebral blood ow with higher spatial and temporal resolution than other techniques [Borogovac and Asllani, 2012]. This thesis specically considers BOLD functional MRI and through the manuscript we use the name functional MRI (fMRI) to denote functional MRI based on the BOLD signal.
The BOLD contrast can be explained by considering a protein present in the blood cells, called hemoglobin. Hemoglobin can bind with oxygen in order to bring it into the dierent cells of the organism, this link being reversible and unstable. Thus, it can be found in two dierent forms: oxyhemoglobin (Hb O2 – giving a bright red color to the blood), its oxygenated form, and deoxyhemoglobin (Hb – giving a blue-purple color to the blood), its deoxygenated form. When the oxyhemoglobin loses its oxygen atoms and becomes the deoxyhemoglobin, it becomes more aected by an externally applied magnetic eld (due to the iron oxides). The presence of deoxyhemoglobin in the blood modies themagnetic resonance signal of the protons of the water molecules surrounding the blood vessels.
Estimation of activation coeicients
In this section we present a model that allows to extract time-independent activation coecients relative to a given task given the BOLD time course and an experimental design. This model is known as the general linear model [Friston et al., 1995]. We start by describing the hemodynamic response function (Section 2.4.1) and then describe an assumption behind the general linear model, the linear-time-invariant property (Section 2.4.2) between the BOLD signal and the neural response. The general linear model is then presented in Section 2.4.3.
The concepts presented in this section will form the basis of the contribution presented in Chapter 4, wherewe present an extension of the general linear model that performs the joint estimation of HRF and activation coef- cients.
Because it will not be referenced in later chapters we do not mention several preprocessing steps that can be applied to the BOLD signal in order to remove artifacts that might have occurred during acquisition or to enhance the signal to noise ratio. These include slice-timing correction, motion correction, spatial normalization and spatial smoothing.
Hemodynamic response function (HRF)
One of the diculties associated with fMRI studies is that BOLD signal does not increase instantaneously after the stimulus presentation nor does it return to baseline immediately after the stimulus ends. Instead, the BOLD signal peaks approximately 5 seconds after stimulation, and is followed by an undershoot that lasts as long as 30 seconds.
The Hemodynamic Response Function (HRF) represents an ideal, noiseless response to an innitesimally brief stimulus. Most software packages represent the HRF as a sum of two gamma probability density functions, where the rst gamma probability density function models the shape of the initial stimulus response and the second gamma probability density functions models the undershoot. Its analytical form is h(t ) = t111 1 e1t (1) c 212 2 e2t.
The linear-time-invariant assumption
In this section we present the main assumption behind the general linear model, the linear time invariance assumption.
A number of studies have reported that in certain regimes the relationship between the neural response and the BOLD signal exhibits linear time invariant (LTI) properties [Boynton et al., 1996, Cohen, 1997, Dale and Buckner, 1997]. These property can be summarized as:
• Multiplicative scaling. If a neural response is scaled by a factor of , then the BOLD response is also scaled by a factor of .
• Additivity. If the response to two separate events is known, the signal for those events if they were to occur close in time is the sum of the independent signals.
• Time invariant. If the stimulus is shifted by t seconds, the BOLD response will also be shifted by this same amount.
While the LTI assumption is commonplace in practice, there is evidence for non-linearity in the amplitude of the BOLD response. For example, it is known that there is a saturation eect for stimuli that occur less than 2 seconds apart [Wager et al., 2005]. It has also been reported that very brief stimuli exhibit a larger BOLD response than would be expected based on longer stimuli [Yesilyurt et al., 2008]. However, while these nonlinearities are important, there is a general consensus that for the range in which most cognitive fMRI studies occur, they will have relatively small impact.
High-pass filtering and prewhitening
The BOLD signal contains low frequency trends that are usually removed before or during the estimation of activation coecients. One popular approach of high-pass ltering is to add a discrete cosine transform (DCT) basis set to the design matrix. When using this basis set, the highest frequency that is desired to be removed from the data has to be chosen to avoid removing the frequency of the experimental task that is also being modeled. Another approach that is becoming increasingly popular, is to t a local regression model to the time series and remove the estimated trend from the data. The software FSL uses LOWESS (locally weighted scatterplot smoothing) [Cleveland, 1979] while recent studies have successfully used a Savizky-Golay lter 5 [Barry and Gore, 2014, Çukur et al., 2013]. In the stud- 5 Savitzky-Golay lter are available in Matlab under the name sgolayfilt and in Python’s ies presented in Chapter 3 we will use this last lter. In [Çukur et al., 2013], the authors used a Savizky-Golay lter to estimate the low-frequency drifts with window length of 240 seconds and polynomial of degree 3. We have found that parameters close to these work well in practice.
The GLM specied in (2.3) assumes the noise » follows a Gaussian random variable with covariance 2I. However, it is known that the BOLD signal is temporally autocorrelated. Several authors [Bullmore et al., 1996, Kruggel et al., 2000] consider the BOLD noise as an autoregressive model AR(1). This assumes each time point is correlated with the previous time point. The distribution of the error in this case is given by » N(0, 2V), where V is the symmetric correlation matrix and 2 is the variance. The correlation matrix and variance are commonly estimated from the residuals after tting the GLM.
The most common solution to take this special structure into account is to prewhiten the data, that is, to remove the temporal correlation. Since the correlation matrix V is symmetric and positive denite, the Cholesky decomposition can be used to nd a matrix K such that V1 = KT K. To prewhiten the data, K is premultiplied on both sides of the GLM (Eq. (2.3)) to give Ky = KX + K ». This makes the errors be independent, i.e., K » N(0, 2I).
Voxel-wise hypothesis testing: Statistical Parametric Maps
Statistical Parametric Maps (SPMs)1 are images with values that are, under 1 Statistical Parametric Mapping the null hypothesis, distributed according to a known probability density function, usually Student t or the F distribution. To create such maps, one proceeds by performing a parametric test at each voxel. The resulting statistics are assembled into an image – the SPM. Given the activation coecients for a single voxel 2 Rk (cf. Section 2.4.3), with k being the number of conditions, it is possible to use a t -test to test whether a given linear combination of the conditions is signi cantly dierent from zero. As we did in section 3.1.1 we introduce the contrast c Rk so that cT is a linear combination of the conditions. The hypothesis can then be written as H0 : cT = 0, H1 : cT 0. In this case, under the assumptions of the t -test for the coecients of a linear regression model (Gaussian and i.i.d noise in the GLM), equation 3.2 gives the expression of the statistic for this test. Assigning the statistic to every voxel creates an image with the same dimensions as the input brain images, in this case a the image is called a t-map because of the t -test used to generate it.
Multiple comparisons issues
One major drawback of statistical parametric maps is the multiple comparisons issue. This occurs when multiple hypothesis tests are performed simultaneously and one must account for the possibility of errors occurring on each of these tests [Toothaker, 1993, Miller, 1966,Westfall, 1993]. In fMRI, due to the huge amount of voxels (on the order of 4 × 104 at 3mm3 resolution), some tests can lead to a large amount of false positive results, i.e., some voxels are found to be signicant while in reality they were not. As a result, it is necessary to consider other types of error rates which account for the multiple comparisons issue.
A simple procedure to control the rate of false positives is through the Bonferroni correction method. This approach consists in dividing the threshold by the number of tests p, which yields the new threshold b = /p. The maps of voxels selected by thresholding the p-values for the object recognition task (subject 1), are given in Fig.3.3, for dierent threshold values (0.05, 0.01 and 0.05 corrected by Bonferroni). We notice that Bonferroni correction is very severe, and that it keeps very few signicant voxels.
Table of contents :
1 Organization and contributions of this thesis
2 Introduction to Functional MRI
2.1 General brain structures
2.2 Functional neuroimaging modalities
2.3 Functional MRI and BOLD signal
2.4 Estimation of activation coeicients
3 Statistical Inference in fMRI
3.1 Hypothesis testing
3.2 Machine learning in fMRI
4 data-driven HRF estimation for encoding and decoding models
4.1 Increased sensitivity via HRF estimation
4.3 Data description
5 Decoding with Ordinal Labels
5.1 Learning from ordinal labels
5.2 Loss functions
5.3 Ranking and ordinal regression
6 Fisher Consistency of Ordinal Regression Methods
6.2 Ordinal regression models
6.3 Consistency of Surrogate Loss Functions
7 Conclusion and Perspectives
7.2 Research Perspectives
7.3 Publications by the author