Get Complete Project Material File(s) Now! »

## Generative Adversarial Networks (GANs)

The original GAN was proposed by Goodfellow et al. [Goodfellow, 2014] for nature image synthesis. Different from the CNN-based models, the GAN consists of two components: a generator G and a discriminator D. The generator G is trained to generate samples which are as realistic as possible, while the discriminator D is trained to maximize the probability of assigning the correct label both to training examples from the real dataset and samples from G. This adversarial training strategy can make the synthesized image to be indistinguishable from the real ones. In order to constrain the outputs of the generator G, conditional GAN (cGAN) [Mirza, 2014] was proposed in which the generator and the discriminator both receive a conditional variable.

More recently, a lot of works using GAN-based methods have further im-proved the medical image prediction results, such as PET-to-MRI prediction for the quantification of cortical amyloid load [Choi, 2018] and CT-to-PET synthesis [Bi, 2017]. Several studies also achieved state-of-the-art results via GANs on other modality synthesis, for instance retinal images [Costa, 2018; Zhao, 2018], ultrasound images [Hu, 2017] and endoscopy images [Mah-mood, 2018]. Unlike optimizing a single loss function used in standard convolutional neural networks, both the generator and the discriminator in GANs have cost functions that are defined in terms of both players’ parame-ters. Because each player’s cost depends on the other player’s parameters, but each player cannot control the other player’s parameters, this scenario is most straightforward to describe as a game rather than as an optimization problem. Both the generator and the discriminator are trained simultane-ously until their losses converge to certain constant numbers, indicating that the GANs model finally finds a Nash equilibrium between the generator and discriminator networks.

### 3D Fully Convolutional Neural Networks

Our goal is to predict FLAIR pulse sequences by finding a non-linear function s, which maps multi-pulse-sequence source images Isource =( IT1, IT2, IPD, IT1SE, IDIR), to the corresponding target pulse sequence Itarget. Given a set of source images Isource, and the corresponding target pulse sequence Itarget, our method finds the non-linear function by solving the following optimization problem: ˆ = arg s∈S P N ))k2 s min iN=1 k(Itargeti , s(Isourcei (2.1).

where S denotes a group of potential mapping functions, N is the number of subjects and mean-square-error (MSE) is used as our loss function which calculates a discrepancy between the predicted images and the ground truth.

In order to learn the non-linear function, we propose the architecture of our 3D fully convolutional neural networks shown in Fig. 2.2. The input layer is composed of the multi-pulse-sequence source images Isource which are arranged as channels and then sent altogether to the network. Our network architecture consists of three convolutional layers (L = 3) followed by rectified linear functions (relu(x) = max(x, 0)). If we denote the mth feature map at a given layer as hm, whose filters are determined by the weights km and bias bm, then the feature map hm is obtained as follows: hm = max(km ∗ x + bm, 0) (2.2).

where the size of input x is H × W × D × M. Here, H, W, D indicate the height, width and depth of each pulse sequence or feature map and M is the number of the pulse sequences or feature maps. To form a richer representation of the data, each layer is composed of multiple feature maps {hm : 1, …, F }, also referred as channels. Note that the kernel k has a dimension Hk × Wk × Dk × M × F where Hk, Wk, Dk are the height, width and depth of the kernel respectively. The kernel k operates on x with M channels, generating h with F channels. The parameters k, b in our model can be efficiently learned by minimizing the function 2.1 using stochastic gradient descent (SGD).

#### Pulse-sequence-specific Saliency Map (P3S Map)

Multiple MRI pulse sequences are used as inputs to predict FLAIR. Given a set of input pulse sequences and a target pulse sequence, we would like to assess the contribution of each pulse sequence on the prediction result. One method is class saliency visualization proposed in the work of Simonyan et al. [Simonyan, 2013], which is used for image classification to see which pixels influence the most the class score. Such pixels can be used to locate the object in the image. We call the method presented in this paper pulse-sequence-specific saliency map to visually measure the impact of each pulse sequence on the prediction result. Our P3S map is the absolute partial derivative of the difference between the predicted image and the ground truth with respect to the input pulse sequence of subject i. It is calculated by standard backpropagation. Mi = k i ∂Ii − ˆi k (2.3). where i denotes the subject, Itarget and Iˆtarget are the ground truth and the predicted image, respectively.

**Materials and Implementation Details**

Our dataset contains 24 subjects including 20 MS patients (8 women, mean age 35.1, sd 7.7) and 4 age- and gender-matched healthy volunteers (2 women, mean age 33, sd 5.6). Each subject underwent the following pulse sequences:

a) T1-w (1 × 1 × 1.1mm3).

b) T2-w and Proton Density (PD) (0.9 × 0.9 × 3mm3).

c) FLAIR (0.9 × 0.9 × 3mm3).

d) T1 spin-echo (T1SE, 1 × 1 × 3mm.3).

e) Double Inversion Recovery (DIR, 1 × 1 × 1mm3).

All have signed written informed consent to participate in a clinical imag-ing protocol approved by the local ethics committee. The preprocess-ing steps include intensity inhomogeneity correction [Tustison, 2010] and intra-subject affine registration [Greve, 2009] onto FLAIR space. Finally, each preprocessed image has a size of 208 × 256 × 40 and a resolution of 0.9 × 0.9 × 3mm3.

Our networks have three convolutional layers (L = 3). The filter size is 3 × 3 × 3 and for every layer the number of the filters is 64 which is designed with empirical knowledge from the widely-used FCN architectures, such as ResNet [He, 2016]. We used Theano [Theano, 2016] and Keras [Chollet, 2015] libraries for both training and testing. The whole data is first normalized by using x¯ = (x − mean)/std, where mean and std are calculated over all the voxels of all the images in each sequence. We do not use any data augmentation. Our networks were then trained using standard SGD optimizer with 0.0005 as the learning rate and 1 as the batch size. The stopping criteria used in our work is early stopping. We stopped the training when the generalization error increased in p successive q-length-strips:

• ST OPp : stop after epoch t iff ST OPp−1 stops after epoch t − q and Ege(t) > Ege(t − q).

• ST OP1 : stop after first end-of-strip epoch t and Ege(t) > Ege(t − q).

where q = 5, p = 3 and Ege(t) is the generalization error at epoch t. It takes 1.5 days for training and less than 2 seconds for predicting one image on a NVIDIA GeForce GTX TITAN X.

**Sketcher-Refiner Generative Adversarial Networks**

We propose Sketcher-Refiner Generative Adversarial Networks (GANs) with specifically designed adversarial loss functions to generate the [11C]PIB PET distribution volume ratio (DVR) parametric map, which can be used to quantify the demyelination, using multimodal MRI as input. Our method is based on the adversarial learning strategy because of its outstanding per-formance for generating a perceptually high-quality image. We introduce a sketch-refinement process in which the Sketcher generates the preliminary anatomical and physiological information and the Refiner refines and gen-erates images reflecting the tissue myelin content in the human brain. We describe the details in the following.

**3D Conditional GANs**

Generative adversarial networks (GANs) [Goodfellow, 2014] are generative models which consist of two components: a generator G and a discriminator D. Given a database y, the generator G defined with parameters θg aims to learn the mapping from a random noise vector z to data space denoted as G(z; θg). The discriminator D(y; θd) defined with parameters θd represents the probability that y comes from the dataset y rather than G(z; θg). On the whole, the generator G is trained to generate samples which are as realistic as possible, while the discriminator D is trained to maximize the probability of assigning the correct label both to training examples from y and samples from G. In order to constrain the outputs of the generator G, conditional GAN (cGAN) [Mirza, 2014] was proposed in which the generator and the discriminator both receive a conditional variable x. More precisely, D and G play the two-player conditional minimax game with the following cross-entropy loss function: min max L (D, G) = Ex,y∼pdata(x,y)[log D x, y )]− G D ( (3.1) Ex∼pdata(x),z∼pz(z)[log(1 − D(x, G(x, z)))].

where pdata and pz are the distributions of real data and the input noise. Both the generator G and the discriminator D are trained simultaneously, with G trying to generate an image as realistic as possible, and D trying to distinguish the generated image from real images.

**Adversarial Loss with Adaptive Regularization**

Here, we propose specific adversarial losses that produce the desired behav-iors for the Sketcher and the Refiner. Previous work of Isola et al. [Isola, 2016] has shown that it can be useful to combine the GAN objective function with a traditional constraint, such as L1 and L2 loss. They further suggested using L1 loss rather than L2 loss to encourage less blurring. We hence mixed the GANs’ loss function. with the following L1 loss for the Sketcher: N LL1(GS) = N1 X |IPi − GS(IMi, zi)| (3.4) i=1.

In CNS, myelin constitutes most of the white matter (WM). Knowing that the demyelinated voxels are mainly found within the MS lesions, we thus want the Refiner network to pay more attention to MS lesions than to the other regions during the prediction process. Most other methods [Roy, 2010; Burgos, 2014; Ye, 2013; Xiang, 2018] tried to synthesize the whole image without any specific focus on some regions of interest. Unlike these methods, to focus the Refiner generator on MS lesions where demyelination happens, the whole image is divided into three regions of interest (ROIs): lesions, NAWM and “other ». We thus defined for the Refiner a weighted L1 loss in which the weights are adapted to the number of voxels in each ROI indicated as NLes, NNAWM and Nother. Given the masks of the three ROIs: RLes, RNAWM and Rother, the weighted L1 loss for the Refiner is defined as follows: 1 N 1 i,j i,j LL1(GR) = i=1 RLes |IP − IˆP |+0.

**Table of contents :**

**1 Introduction **

1.1 Context

1.1.1 Multiple Sclerosis

1.1.2 Multimodal Neuroimaging in Multiple Sclerosis

1.2 Deep Learning for Medical Image Prediction

1.2.1 Convolutional Neural Networks (CNNs)

1.2.2 Generative Adversarial Networks (GANs)

1.3 Thesis overview

**2 FLAIR MR Image synthesis from Multisequence MRI using 3D Fully Convolutional Networks for Multiple Sclerosis **

2.1 Introduction

2.2 Method

2.2.1 3D Fully Convolutional Neural Networks

2.2.2 Pulse-sequence-specific Saliency Map (P3S Map)

2.2.3 Materials and Implementation Details

2.3 Experiments and Results

2.3.1 Model Parameters and Performance Trade-offs

2.3.2 Evaluation of Predicted Images

2.3.3 Pulse-Sequence-Specific Saliency Map (P3S Map) .

2.4 Discussion and Conclusion

**3 Predicting PET-derived Demyelination from Multisequence MRI using Sketcher-Refiner Adversarial Training for Multiple Sclerosis **

3.1 Introduction

3.1.1 Related Work

3.1.2 Contributions

3.2 Method

3.2.1 Sketcher-Refiner Generative Adversarial Networks .

3.2.2 Adversarial Loss with Adaptive Regularization

3.2.3 Visual Attention Saliency Map

3.2.4 Network architectures

3.3 Experiments and Evaluations

3.3.1 Overview

3.3.2 Comparisons with state-of-the-art methods

3.3.3 Refinement Iteration Effect

3.3.4 Global Evaluation of Myelin Prediction

3.3.5 Voxel-wise Evaluation of Myelin Prediction

3.3.6 Attention in Neural Networks

3.3.7 Contribution of Multimodal MRI Images

3.4 Discussion

3.5 Conclusion

**4 Predicting PET-derived Myelin Content from Multisequence MRI for Individual Longitudinal Analysis in Multiple Sclerosis **

4.1 Introduction

4.1.1 Related work

4.1.2 Contributions

4.2 Method

4.2.1 Overview

4.2.2 Conditional Flexible Self-Attention GAN (CF-SAGAN)

4.2.3 Adaptive Attention Regularization for MS Lesions

4.2.4 Clinical Longitudinal Dataset

4.2.5 Indices of Myelin Content Change

4.2.6 Network Architectures

4.3 Experiments and Evaluation

4.3.1 Implementation and Training Details

4.3.2 Evaluation of Global Image Quality

4.3.3 Evaluation of Adaptive Attention Regularization

4.3.4 Evaluation of Static Demyelination Prediction

4.3.5 Evaluation of Dynamic Demyelination and Remyelination Prediction

4.3.6 Clinical Correlation

4.4 Discussion

4.5 Conclusion

**5 Conclusion and Perspectives **

5.1 Main Contributions

5.1.1 Predicting FLAIR MR Image from Multisequence MRI .

5.1.2 Predicting PET-derived Demyelination from Multisequence MRI

5.1.3 Predicting PET-derived Dynamic Myelin Changes from Multisequence MRI

5.2 Publications

5.3 Perspectives

5.3.1 Deep Learning for Medical Imaging Synthesis

5.3.2 Synthesized Data for Deep Learning

5.3.3 Interpretable Deep Learning for Clinical Usage

**Bibliography **