Segmentation and registration
Medical image registration has become an eﬃcient tool to compare images from two diﬀerent subjects. The diﬀerence between two objects observed in images correspond to a variability in position, described by rigid transformations, size, described by aﬃne transformations, and more general transformation that change the details of the ob-jects. These diﬀerent classes of transformations are characterized by diﬀerent degrees of freedom: six for rigid transformation, twelve for aﬃne transformation, a few tens or hundred of degrees of freedom when parametric models are used (polynomials, spline) and up to thousands or even infinite when considering more generic models that induce large deformations. Rigid or aﬃne methods are widely used in clinical practice because of their simplicity and reduced computational cost. However, as the variability of the brain consists of changes in its size, shape and internal tissue organization, aﬃne de-formations are not suﬃcient to represent brain variability. Non-rigid deformations have therefore been proposed to overcome this limitation. The research on non rigid defor-mation methods has made considerable progress in the past years. In the following, we briefly describe some of them.
Non-rigid registration methods
Non-rigid registration based on aﬃne transformation and polynomials :
Farneback  proposed a non-rigid registration method based on polynomial transformation. It was further developed by Wang et al. . The disadvan-tage of this method is that it requires polynomial functions of high degree to produce a well behaved transformation.
Non-rigid registration based on physical models :
Non-rigid registration techniques based on physical models are also common. These approaches assume that the registration object is a homogeneous isotropic elastic body. They first create a physical model of the object and then deform this model under external forces to achieve the registration with the reference image. This registration method can ensure the smoothness of the deformation and the preser-vation of the object topology. Most contributions on these methods have focused on the definition of external forces, while some of them improve the method by adjusting the elastic properties of the physical model. For example, Christensen  proposed to replace the elastic model by viscous fluids model, however this method increases the computational complexity, Bro-Nielsen  improved the viscous fluids model by using fast convolution methods.
Non-rigid registration based on a smooth function :
Alternative methods based on smoothing functions, such as thin-plate spline, mul-tivariate quadratic equations or transforms of a Gaussian function have also been used in his context. These models involve smooth functions, which are regularized so that they represent a prior on relevant deformations. Meyer et al.  use a thin-plate spline interpolation to adjust the position of homologous control points and calculate the similarity of image to the reference image after transformation.
Like the thin-plate spline, B-spline is also widely used in non-rigid transforma-tion. For this transformation, the deformation field is calculated by using B-spline interpolation to adjust the position of the control points. This transformation is usually performed using mutual information similarity measure (see [116, 128], and definition in Appendix A).
Linearized deformations :
Linearized deformation methods [10, 15] define a deformation ϕ of a domain D as the displacement of each point x in D by a vector v(x), which is written as: ϕ(x) = x + v(x).
In this case, the image under deformation is modeled as an elastic body. The model is simple because it only depends on a vector field v. An important drawback is that when the source and target images are interchanged, the obtained transformation is not the inverse of the previous solution, as the invertibility is not ensured.
Large Deformation Diﬀeomorphic Metric Mapping (LDDMM) :
LDDMM  has been proposed to model large deformations and overcome the non invertibility issue. In this case, the deformation is modeled through the com-position of a velocity field that evolves over time according to the Lagrange trans-port equation. This method provides the definition of a mathematical metric on the space of images. This distance is the length of the geodesic that connects them according to this metric and can be used to study the anatomical variability [23, 137]. However, it is computationally costly because the velocity field has to be integrated over time. Recent methods [52, 53] approximate this diﬀeomorphism by a finite-dimensional vector enabling fast and accurate registration.
Demons and Diﬀeomorphic demons :
Thirion  proposed to perform image registration as a diﬀusion process inspired by Maxwell’s Demons. The diﬀusion is driven by the force called “Demons force”. The Demons algorithm is an eﬃcient algorithm that provides dense correspon-dences but lacks a sound theoretical justification. Vercauteren et al.  propose a non parametric diﬀeomorphic image registration algorithm based on the initial Thirion’s demons algorithm. They provide theoretical properties of the diﬀerent variants of this algorithm. These algorithms converge quickly, however, they are not based on a sound mathematical model that would provide a clearly defined metric on the image space. Tools can be download in Insight Segmentation and Registration Toolkit (ITK) .
A more detailed description of multiple image registration techniques are presented in . Fourteen of them are compared in .
Application of brain image registration
According to the modality and the patient, non-rigid registration can be used in diﬀerent frameworks:
Registration of the same modality of one patient (longitudinal study) :
This approach is used to study the evolution of a subject over time. If these images contain some non-rigidly evolving structures such as tumors or stroke lesions, the brain is consequently deformed between one time point and the following ones. The registration map highlights (and depending on the method, quantifies) the diﬀerences, which may be used in disease diagnosis or observation of treatment results.
Registration of images from the same modality in diﬀerent patients (intersubject):
This is the most common use of registration mappings. Indeed, it makes it possible to compare two subjects by highlighting shape diﬀerences. This is now used for statistical analysis of populations that were either compared pairwise (for example ) or mapped to a given template. In particular, the statistical analysis can be performed in the space of deformation, which appears to be a linearizing space to analyze shape diﬀerences (among many others, we can cite [37, 44, 90, 137]).
This is one aspect that we will consider for our purpose in this work.
Registration of images from diﬀerent modalities in the same patient :
In this case, the images to be registered come from diﬀerent modalities of the same patient. This is currently the most frequent use of registration in clinical settings (mostly with rigid-body deformations). It is used to fuse the information from the diﬀerent modalities into the same image so that the information can be combined in order to increase the accuracy of the analysis.
This is also an aspect that we will use in the following. In particular, considering together anatomical and functional MRIs can be used to highlight some particu-lar regions in the gray matter that are activated during specific cognitive tasks. Constraining this activation to appear only in the gray matter potentially pro-vides more accurate localization and thickness of the gray matter and a better localization of the subcortical structures. On the other hand, as the gray matter is observed in the anatomical image, this knowledge also increases the accuracy of the support of the brain activity. This coupling will be at the core of our multimodal atlas estimation in Chapter 4.
Non-rigid registration algorithms
There are currently many reports about non-rigid registration algorithms in the litera-ture. Most of the non-rigid registration problem can be seen as an optimization problem where the energy to minimize takes into account the diﬀerence between one reference image and the deformed image and a regularity term that penalizes “too large” defor-mations – which has to be defined according to the deformation framework. This can be expressed as
ϕˆ = arg max E(ϕ • B, A, ϕ) = J(ϕ • B, A) + L(ϕ) , (1.1)
where A and B are the first and the second images, J is the similarity measure, ϕ is the transformation between the two images, S the space of admissible deformations and L the regularity term. The diﬀerent approaches diﬀer from each other by considering diﬀerent S, J and L.
Diﬀerent frameworks to construct and constrain ϕ have been given above as well as the regularity term that includes the expected constraints on ϕ. For example, in LDDMM and Diﬀeomorphic demons, ϕ is a diﬀeomorpism. ϕ • B = B ◦ ϕ−1 and L is the kinetic energy of the path generated by ϕ. The similarity measure J quantifies the similarity be-tween two images and the most used are the correlation , gradient cross-correlation , mean square error , mutual information or normalized mutual information  that are described in Appendix A.
Although quite diﬀerent, all these approaches can be solved the same way as the opti-mization of the matching energy E. Currently, gradient descent optimization method -and its accelerated versions- appears to be the easiest algorithm. However, other meth-ods such as Newton-Raphson, quasi-Newton, the simplex method, simulated annealing or even genetic algorithms, may be used depending on the energy E.
Medical image segmentation is the key to performing computer image analysis. It has important significance in biomedical research, clinical diagnosis and pathological anal-ysis. The main factors aﬀecting the brain MR image segmentation are: 1) intensity inhomogeneity: due to the radio frequency magnetic field inhomogeneity, MR equip-ment itself, the diﬀerences between tissues in the brain, etc. 2) Partial Volume Eﬀect (PVE): due to the movement of the patient –given that the acquisition lasts several minutes– and the complex shape of the structures, the boundaries of the target struc-tures are not continuous and the boundaries between the soft tissue can be blurred. 3) Image noise. These issues altogether make segmentation a challenging task.
Methods of segmentation
In this section, we recall few examples of image segmentation that we have grouped into four classes based on the way they are performed.
Gray level based approaches :
This class of methods includes the threshold-based method , region growing , clustering algorithms , etc. These methods are based on the gray level image directly. The threshold-based method assumes that the distribution of the target and the background is separable in the image histogram, so one can use a threshold to distinguish between the target and the background. The method is simple, easy to implement and fast. However in brain MR images, due to the in-tensity inhomogeneity, the distribution of the tissues often overlap. It is impossible to obtain the correct segmentation result by the threshold method. The main idea of the region growing method is to set the initial seed, classify the voxel that are similar to the seeds of a given class. The disadvantage of this method is that the result of the algorithm often depend on the selected seeds. Clustering algorithms include K-means algorithm , fuzzy C-means algorithm (FCM) , Gaussian mixture model (GMM) . Wells et al.  propose a segmentation method using Expectation-Maximization (EM) algorithm, that can automatically segment the tissues, however this method requires the specification of Gaussian distribution for each class. Ahmed et al , Pham et al.  have proposed methods based on Fuzzy C-Means algorithm that segment and remove the bias field at the same time. However the initial values are sensitive for these clustering algorithm. In addition, most of the proposed models are based on the voxel information and do not consider the smoothness of the tissue. Ashburner and Friston  propose a probabilistic framework for joint nonlinear registration, intensity normalization and segmentation of a single image providing tissue probability maps.
Active contour method :
Active contour segmentation models have become an important research topic in the last decades: specifically, parametric active contour model (Snake model) and geometric active contour model (level set model) are two commonly used active contour models. Here is a brief introduction of these two models, that come from computer vision and thus may fail due to the characteristics of brain images.
(i) Parametric active contour model
The parametric active contour model (Snake) was proposed by Kass et al. , further developed by  and successfully applied to the image segmentation [41, 92]. The idea is to find a methodology for the minimization of a criterion composed of the internal energy and external energy over the entire shape. Although this model has achieved great success in medical image segmentation, there are still some disadvantages: 1) the results are often dependent on the initialization; 2) the capture range of edge energy is small; 3) the sensitivity to noise is large, the model fails in images with weak boundaries. As the topology of brain MR images are complex, Snake models cannot be used to segment several structures in the brain MR images at the same time. However, they are often used on a single tissue segmentation , for example, segment the hippocampus, ventricles or tumor.
(ii) Geometric active contour model
Geometric active contour models can be considered as an improvement of Snake model, proposed by Caselles et al.  and Malladi et al. . This model can be solved by the level set method. The level set method can be applied to image denoising and enhancement, image segmentation, image restoration. The level set model [18, 46, 101] improves upon the Snake model, as it allows topological changes (the contour can break up, merge, or disappear during the course of time evolution). However the resulting segmentation is sensitive to the initialization and it is diﬃcult to obtain global optimal solution. Chan-Vese model  is a classical model, that assumes the image is divided into two parts with diﬀerent means and segment the target from the background. Since it uses the gray level information of both the internal and external of the contour, it still works well for the images with large noise or weak boundaries. However the model assumes that the intensity is homogeneous in each class, therefore it often yields a wrong segmentation on brain MR images. Many segmentation models [142, 156] based on local information have been proposed in order to overcome the intensity inhomogeneity in the brain MR images. However the result is often dependent on the initialization.
Graph-based segmentation :
Graph-based segmentation was introduced for image analysis in 1971 . This method converts the image to a weighted graph where the voxels are handled as nodes, then it uses a minimum cut criterion, e.g. minimal spanning tree, to get the best image segmentation. It essentially transforms image segmentation problem into a discrete optimization problem with convex relaxation. This class of methods include the graph cuts , random walker , isoperimetric partitioning  and so on. Currently, the research of the graph-based segmentation focuses on the following aspects: 1) optimal cut-oﬀ criteria [28, 69]; 2) spectral methods for segmentation [45, 113]; 3) fast algorithms [29, 56].
Atlas-based segmentation :
Atlas-based segmentation can been considered as a registration problem, namely that of deforming a brain atlas into a new subject’s brain. It relies on a reference image in which some structures have been segmented. Then, a non-rigid registra-tion is applied from the reference to the new subject. The structures of the new subject are segmented by transporting the labels from the reference. Atlas-based segmentation is of particular interest when the information from the gray level intensities is not suﬃcient. For example, the gray level intensities of subcortical structures may be closer to the white matter (WM) than gray matter (GM) inten-sity mean. This makes it diﬃcult to segment these structures as GM without any prior. However, atlas-based segmentation manages to segment these structures correctly with the help of a priori information given by the atlas .
There are two problems with atlas-based segmentation of brain images. The first one is the choice of the template, as it should be representative of a population and carry an accurate segmentation. The second one is the choice of the registra-tion framework that constrains the deformation and therefore captures diﬀerent similarity while leaving residuals which may be of interest. This assumes that the template is close to the subject’s anatomy. Otherwise, large registration errors may cause important segmentation errors if large anatomical diﬀerences exist.
Aljabar et al.  proposed two methods for templates selection. The first one uses meta-information. They select the template that is closest in age to the image to be segmented. The second one uses similarity metrics to compare the images. Obviously, the quality of the results depends on the metric used.
Another solution is to use multi-atlas based segmentation. It has been proposed to better deal with the registration errors obtained when using a single template. It also better captures the anatomical variability. Klein et al.  and Wu et al.  show that using multi-atlas segmentation improves segmentation accuracy. One problem is that it is unclear how many templates should one use for the multi-atals based segmentation. Aljabar et al.  shows that it is not necessary to use all the observations in the data set as templates. First, if the number of “templates” is large, registering all “templates” to the image to be segmented increases the computation time. Second, the anatomical structure may vary across the population. For example, if a structure can be represented by two distinct shapes, using a large number of templates may give a shape that does not represent either of them. In their study, they find that 15-25 templates are suﬃcient to get a good performance. However, the choice of the 15-25 templates still depends the metric they used.
When doing atlas-based segmentation, the templates selection depend on the metric used to compare images. The choice of the registration framework assumes that the template is close to the subject’s anatomy. A statistical estimation of the atlas that contains the template and the metric is a better choice, because it avoids to fix the template or the metric which are closely linked. Now we will focus on atlas construction.
Two approaches have been used to construct templates from medical images: to take the image closest to the population mean or to estimate the true population mean.
Marsland et al.  propose a method to construct the template as the observed image that is the closest to the geometrical mean. They choose the target image that minimizes the sum of distances from this image to the rest of the images and maximizes the sum of mutual information (MI) between them. Park et al.  propose a method that deals with the same problem as Marsland et al.’s. Their selected image is chosen based on MI only, which is most robust to the noise inherent in anatomical images. These approaches gives the template with the smallest possible bias but nevertheless this template is still biased, as it is one of the image in the dataset. It is not the real mean geometry image.
Studholme  proposes a method to jointly register all images simultaneously to a common space that is very close to the mean geometry to reduce the bias inherent to the choice of one particular sample. A cost function is optimized with the aim of maximizing the similarity between images, while penalizing displacement from the average shape. However, this requires explicitly choosing a weighting parameter to specify the influence of the penalty term and thus how well the constraint is satisfied.
Bhatia et al.  propose a method where an arbitrary image is used just as an inten-sity reference, after which the similarity between images is maximized using non-rigid registration. To ensure that the image calculated in this fashion is actually the mean, they enforce the constraint that the sum of all transformations, represented by a suit-able parametrization, is equal to zero. The algorithm does not require specifying any geometric reference, however an intensity reference has to be chosen to evaluate the similarity during the registration step.
Guimond et al.  propose an automatic method to build average anatomical brain atlases. First, they choose an image as the reference and register all images to it using an aﬃne registration to correct the position and global shape diﬀerences. Then, they register the images after the aﬃne registration to the reference using an elastic registra-tion and calculate the average of the images and deformations after registration. At last, they apply the average deformation to the average images in order to compute the aver-age atlas. The advantage of this method is that it depends less on the reference image used for its construction. A similar idea has been proposed in  where the template is the mean deformation of an hyper template. However it requires a pre-segmentation of the data.
Rueckert et al.  propose a statistical deformation models that can be used to construct an average atlas of the anatomy and their variability. They use a non-rigid registration algorithm. The transformation consists of a global transformation and a local transformation. This algorithm leads to good correspondences, in particular, in the subcortical structures.
Joshi et al.  propose a method to estimate a template given a collection of observed images. The problem is that the deformation is applied to the observations whereas it should act on the template (ideal image). This makes this estimation diﬀerent from the computational anatomy model proposed by Grenander and Miller . Moreover, the iterative numerical scheme is very sensitive to noise.
Niethammer et al.  propose robust estimation methods for parametric models based on Diﬀusion Weighted Imaging (DWI). By applying the estimation method to registered DWIs, a DW-atlas is constructed. This allows for the representation of average diﬀusion information with more flexible diﬀusion models than the diﬀusion tensor.
All the methods so far produce deterministic templates, that is to say a deterministic gray level image. On the other hand, probabilistic templates providing tissue probability maps are especially attractive, as they make it possible to take into account the uncer-tainty on the underlying tissue type, which is related to partial volume eﬀect (PVE) or to perfectible registration. Moreover, most of the time, the template is estimated without the geometric variability which may be analyzed a posterior by PCA or ICA. However, the template estimation requires to choose a deformation framework which somehow freeze the geometric variability a priori. Therefore, the posterior analysis highly de-pends on the prior choice. This suggests that the template should be learned jointly with the geometric variability of the shapes represented in the population. These two quantities will in the sequel form the atlas of the corresponding population. This is the leading idea of our models.
In , a method was proposed to do the segmentation and registration jointly, while creating an average brain template. This approach combines groupwise registration us-ing the Kullback-Leibler divergence and the EM algorithm for segmentation, and thus demonstrates the benefit of their integration. However it does not learn the geomet-ric variability within the estimation procedure, which may reduce the accuracy of the template to match the observations with prior deformations.
Ribbens et al.  propose a probabilistic model to segment a heterogeneous data set of brain MR images simultaneously while constructing templates for each mode in the heterogeneous population using an EM algorithm. However, it performs clustering as an additional step, and does not learn the geometric variability of the population.
Sabuncu et al.  propose a spherical demons algorithm with geometric variability for registering images and for creating an atlas. The registration was more accurate and this registration could be used to transfer segmentation labels onto a new image. With such an approach, the segmentation is not performed during the estimation.
Glasbey and Mardia  made the first step towards the statistical estimation of a complete atlas based on a statistical model. This was improved by Allassonni`ere et al.  who proposed a model to create an atlas containing the geometric variability. As the inputs are scalar images, the template is also estimated as a scalar (gray level) image. As a consequence, the segmentation of the population is not part of the estimation process. Using both kinds of information increases the population classification accuracy, as the model better fits the observations, as for the population classification in , the segmentation (tissue classification of voxels) takes advantage of the registration to the template. Although they learn the geometric variability, their template is deterministic. In this work, we will develop these models in order to create a probabilistic atlas.
Algorithms used in this work
Optimization is a branch of applied mathematics, the main research question is to find the minimum of a function f (x). The gradient descent method is the oldest optimization method proposed by Cauchy in 1847 . The algorithm starts with an initial value x(0) and calculate the gradient vector ∇f (x(0)). The next value x(1) obtained by moving some distance from x(0) in the direction of the descent. In general, x(k+1) = x(k) − α(k)∇f (x(k)), where α(k) is the step-size which can be changed at each iteration.
The advantages of this method are that it has often low computation and storage. As an oldest method, the rate of convergence is slower than many other methods. Another limitation of this method is that sometimes it is diﬃcult to calculate the gradient and to choose α(k). However, this method can be optimized using Newton-Raphson algorithm or FISTA .
Table of contents :
1.1 The human brain and its shape
1.1.1 Brain anatomy
1.1.2 Brain imaging techniques
1.1.3 Computational neuroanatomy
1.1.4 Publicly available brain templates
1.1.5 Some application of digital brain templates
1.2 Segmentation and registration
126.96.36.199 Non-rigid registration methods
188.8.131.52 Application of brain image registration
184.108.40.206 Non-rigid registration algorithms
220.127.116.11 Methods of segmentation
1.3 Template estimation
1.4 Algorithms used in this work
1.4.1 Gradient descent
1.4.2 Stochastic algorithms
1.5 Contributions of this work
1.5.1 Generative Statistical Model
1.5.2 Statistical Learning Procedure
1.5.3 Segmentation of new individuals
2 Probabilistic Atlas and Geometric Variability Estimation
2.2 The Observation Model
2.2.1 Statistical Model
2.2.2 Parameters and likelihood
2.2.3 Bayesian Model
2.3.1 Existence of the MAP estimation
2.3.2 Consistency of the estimator on our model
2.4 Estimation Algorithm using Stochastic Approximation Expectation-Maximization
2.4.1 Model factorization
2.4.2 Estimation Algorithm
Step 1: Simulation step
Step 2: Stochastic approximation step
Step 3: Maximization step
2.4.3 Convergence analysis
2.5 Experiments and Results
2.5.1 Simulated data
2.5.2 Real data
2.6 Conclusion and discussion
2.7 Proof of Theorem 2.1
2.8 Proof of Theorem 2.3
2.8.1 Proof of assumption (A1’)
2.8.2 Proof of assumption (A2)
2.8.3 Proof of assumption (A3’)
3 Bayesian Estimation of Probabilistic Atlas for Tissue Segmentation
3.3.1 Statistical Model
3.3.2 Estimation Algorithm
3.3.3 Segmentation of new individuals
3.4 Experiments and Results
3.5 Conclusion and Discussion
4 Bayesian Estimation of Probabilistic Atlas for Anatomically-Informed Functional MRI Group Analyses
4.2.1 Statistical Model
4.2.2 Estimation Algorithm
4.3 Experiments and Results
4.3.1 Simulated data
4.3.2 In-vivo data
5 Including Shared Peptides for Estimating Protein Abundances: A Significant Improvement for Quantitative Proteomics
6 Conclusion and Discussion
6.2 Large deformations for deformable template estimation
6.3 Multicomponent generalization of the models
6.4 Other remarks
6.4.1 Extension of the multi-modal atlas
6.4.2 Kernel choice
6.4.3 Algorithm implementation optimization
6.4.4 Bias field correction
A Definition of the most used similarity measures