Deep learning with unbalanced data and misclassified regions

Get Complete Project Material File(s) Now! »

Survey on MRI image segmentation

Most research on Pattern Recognition, Computer Vision and Machine Learning from 1970s to 1990s, were based on low-level image operations and mathematical modeling [Litjens et al., 2017]. At this time, researchers generally use a few examples data to build a base of knowledge and rules, i.e., a set of if-then-else statements, what was known after that as expert systems such as MYCIN [Shortliﬀe et al., 1975; Gordon and Shortliﬀe, 1984]. Then, the research after 1990s jumped to develop systems based on supervised learning algorithms, features extraction (i.e., vector of binary or real values) and statisti-cal models [Litjens et al., 2017]. In this era, researches use features engineering techniques (i.e., hand-designed features) to extract discriminant features from training MRI images to represent and classify each image as for example malignant or benign, or to detect each object or region inside the image. Thus, the use of discriminative models such as SVM [Cortes and Vapnik, 1995] and feature engineering became popular, especially in medical image analysis. Example include the use of large number of training data to extract several features then by using a classifier or a model at the end of the pipeline to distinguish between the diﬀerent classes or groups. The issue of features engineering is that it requires high domain knowledge of the given task, therefore, for each task, we need people with specialized knowledge, besides, in many cases, the process of gathering data is expensive such as medical data. The annotation of medical data requires collaboration between diﬀerent people from diﬀerent backgrounds such as radiologists, oncologists…etc. All these reasons have contributed and pushed the academic research and the industry to explore and discover more practical and less expensive methods, and here the era of machine learning and deep learning comes in. Thus, the research has transitioned from feature engineering to designing networks with many layers to extract features. This per-ception allows us to see machine learning and deep learning methods as an automation of feature engineering to advance research and to reduce the computational cost. The following sections will give an overview of classical and modern approaches, presenting in the first section five approaches, and discussing their diﬀerences and challenges. In the second section, we discuss modern approaches, in particular, methods-based deep learning. In the last section, a summary and the most challenging issues of deep learning methods with GBM brain tumor segmentation.

Classical approaches

Here, we present the most popular classical MRI image segmentation approaches:
— Threshold-based methods [Gibbs et al., 1996; Stadlbauer et al., 2004]: these methods use one type of MRI images, usually a MRI image with more contrast (e.g., T1 or post-contrast T1 weighted). They rely on a high-intensity signal as a threshold value in MRI images to extract relevant tissue that is classified after that as a tumor class or healthy tissue class. These methods are iterative, in which they apply a global threshold, local or an adaptive threshold to distinguish between all diﬀerent tissues for many iterations. These methods are simple and computationally eﬃcient but they have a lot of drawbacks and generally, they fail in real applications. Among these drawbacks: they are sensitive to noise, need a user-interaction, applicable more to binary segmentation issues; so they do not scale to more complicated and multi-classification issues.
— Region-based methods [Kaus et al., 2001; Letteboer et al., 2004; Cates et al., 2005]: these methods are based on the technique of subdivision or composition of an image into homogenous regions, where each region has a set of connected pixels. These methods deal with pixel-level and they apply two metrics. First, homogeneity criteria-based metrics to connect all candidate pixels. Second, dis-continuity criteria-based metrics to find the boundaries among diﬀerent regions. These methods apply a repetitive algorithm of composition or of split and merge until constituting a uniform region (e.g., tumor region).
— Edge-based methods [Caselles et al., 1993; Lefohn et al., 2003; Cates et al., 2004]: edge detection technique is a very important step in image processing and computer vision fields. Moreover, edge detection in a MRI image helps to ex-tract and reduce useful information which in its turn aid to apply image analysis techniques. Edges correspond to object or region boundaries. At the pixel level, the edge is where we can see a significant change between two or more neigh- boring pixel values. We can group edge detection techniques into two categories: gradient-based methods and laplacian-based methods. The first category relies on the maximum and minimum in the first derivative such as Prewitt, Sobel, Roberts operators while the second category relies on zero-crossings in the second derivative.
— Atlas-based methods [Moon et al., 2002; Prastawa et al., 2003; Menze et al., 2010]: these methods deal with more global information extracted from an im-age; they attempt to segment a MRI image with no well-known relation between regions and pixels’ intensities. Usually, the process of Atlas-based segmentation methods involves many stages, the most important ones are image registration and Atlas construction. Atlas refers to a template or a model, where for each diﬀerent application, we construct a diﬀerent template. Some simple applications use a sin-gle template while in other we need to use multiple templates; for each population of images, they construct an Atlas (e.g., an Atlas for a healthy population and an Atlas for a diseased population).
— Classification and Clustering methods [Ozkan et al., 1993; Clark et al., 1998; Bhandarkar and Nammalwar, 2001b; Fletcher-Heath et al., 2001; Geremia et al., 2011]: these methods are a subpart of machine learning methods. Classification is a supervised learning algorithm while clustering is an unsupervised learning algorithm. Classification methods uses training dataset (images and labels) to minimize or maximize an objective function (loss function) such as SVM algo-rithm with radial basis function kernel which tries through an objective function to maximize the margin between two diﬃrent classes. Thus, SVM algorithm clas-sifies each pixel to one of the predefined numbers of classes. Clustering methods is an iterative algorithm that tries to subdivide training images into several disjoint clusters such as K-means algorithm that works by partitioning an image into a number of centroids, where each one represents the center of a cluster. Thus, the K-means algorithm tries to label each pixel by assigning it to one centroid among the predefined number of centroids, i.e., the number of centroids is the number of classes.

Modern approaches

In 2006, appeared a new type of learning algorithms called Deep Neural Networks (DNNs) [Hinton and Salakhutdinov, 2006; Hinton et al., 2006; Bengio et al., 2007] which use a large number of data to extract many lower-level features such as lines, edges with diﬀerent orientations then it combines them in a hierarchical way to obtain higher-level features such as shapes, objects and faces,…etc. In this era, two DNNs algorithms be-came increasingly popular: Stacked Auto-Encoders and Deep Belief Networks 1 [Hinton et al., 2006; Lee et al., 2009]. Those algorithms have solved the issue of training large and deep architecture of DNNs, but the training is relatively slow [Zeiler et al., 2011]. In 2014, the research started to use diﬀerent variants of DNNs architectures in medical image analysis, in particular, tumor and lesion segmentation.
[Zikic et al., 2014] proposed a CNNs model for the segmentation of GBM brain tu-mors using MRI images. The input of their algorithm is a (4*19*19) 2D patch (i.e., four channels: T1, T2, T1c, and FLAIR). Their CNNs network is used to segment the MRI images to five classes: non-tumor, necrosis, edema, non-enhancing tumor and enhancing tumor. Moreover, their CNNs model is a sequential network contains 5 layers.
Another work used CNNs introduced by [Urban et al., 2014] for GBM brain tumor segmentation, where their approach is diﬀerent. They used a sequential 3D-CNNs model, in which the input data are 3D patches of size (9*9*9) voxels with four channels (i.e., T1, T1c, T2, FLAIR). Moreover, as known in several CNNs models, they did not use the pooling layers to reduce the size of feature maps and therefore, to reduce the computa-tional costs. Moreover, they used an additional post-processing step, where they removed all regions of less than 3000 voxels. In addition, this method takes one minute to segment the whole brain using a GPU implementation.
[Axel et al., 2014] proposed a CNNs architecture for the segmentation of GBM brain images. The input data of their network is 2D patches of size (32*32) pixels. The network is divided into two parallel pathways; the first pathway is a CNNs architecture (i.e., two Maxout convolutional layers -Maxout is the result of merging two or more feature maps- ), and the second pathway is a fully connected Maxout layer, then these two pathways are concatenated at the end into a softmax layer. Moreover, this architecture takes 20 minutes to segment the whole brain using a GPU implementation.
[Pereira et al., 2015] developed a method based on CNNs for GBM brain tumor segmentation. They used two CNNs architectures for each type of Glioblastomas (i.e., High-Grade HGG and Low-Grade LGG). Their method takes as inputs 2D patches of size (33*33) pixels with four channels (i.e., four MRI sequences: T1, T2, T1c, and FLAIR). The last implemented step in their algorithms is a post-processing method, they applied a morphological filter to delete isolated regions. In addition, this model takes 10 minutes to segment the complete brain using a GPU implementation.
[Havaei et al., 2017] proposed a fully automatic brain tumor segmentation method based on Cascaded Convolutional Neural Networks which are an extended version of [Axel et al., 2014]. These Cascaded Networks used two pathways that are trained in diﬀerent phases to capture local and global features. They also used as inputs 2D Axial patches with four MR sequences as channels (i.e., T1, T2, T1c, and FLAIR) where each pathway has a diﬀerent input patch size. Moreover, they proposed two stages of training for the problem of class-imbalance to correct in the second training stage the patches that are biased toward the wrong class. After that they applied a threshold technique as a post-processing method to remove the connected-components near to the skull. In addition, this Cascaded architecture takes 180 seconds to segment the complete brain using a GPU implementation.
[Chang et al., 2016] developed a CNNs model that is based on two concepts (1) a fully convolutional architecture that predicts a dense output matrix size as used in the original input [Long et al., 2015], (2) Hyperlocal features concatenation; the input MR images is re-introduced in the concatenation layer before the output. This technique is used first by [Yang and Ramanan, 2015] in their architecture Directed-Acyclic-Graph which is a new variant of standard CNNs [LeCun et al., 1998]. The architecture of [Chang et al., 2016] has 7 convolution layers in addition to upsampling and concatenation layers. In their CNNs architecture, 4 channels of MRI images are used as inputs (i.e., FLAIR, T2, T1c and T1). Moreover, this architecture takes 0.93 seconds to segment the entire brain using a GPU implementation.
[Ellwaa et al., 2016] proposed an iterative method which is based on a random forest with 100 trees, each of which has depth 45. Their method extracts 328 features from MRI images. These features are: gradient features, appearance features, and context aware features. The input to their method is 4 channels MRI images (i.e., FLAIR, T1, T1c and T2). This iterative method works by choosing in each iteration 5 patients’ MRI images, then they add these images to the training set, after that they continue the training of this random forest until the training set reaches 50 patients, where at 50 patients their iterative method stops.
[Kamnitsas et al., 2016, 2017] developed a 3D-CNNs model for GBM brain tumor segmentation based on the model’s performance of [Urban et al., 2014]. These 3D-CNNs networks composed of dual pathway with 11 layers, the input to this network is 3D MRI images (i.e., FLAIR, T1, T1c and T2). Also, each pathway has a diﬀerent input patch size (i.e., 4 * 253, 4 * 193). Then, they added conditional random field as a post-processing operation to remove misclassification regions and as a spatial regularization. Moreover, they extended their CNNs network with residual connections [He et al., 2016], in which this new extended network [Kamnitsas et al., 2016] did not obtain a big improvement compared to the original model [Kamnitsas et al., 2017]. Their 3D CNNs model takes 24 hours for training using a GPU implementation, and for testing it takes 35 seconds to segment the entire brain.
[Zhao et al., 2018] developed a GBM brain tumor segmentation method based on the integration of CNNs and conditional random field in one network, as opposed to [Kamnit-sas et al., 2017] who used conditional random field as a post-processing step. The authors of [Zhao et al., 2018] developed 3 CNNs networks that take as an input 3 types of MRI images (i.e., FLAIR, T1c and T2). Each of these 3 networks use two pathways similar to [Havaei et al., 2017; Kamnitsas et al., 2017], where these pathways are trained on 2D image patches (i.e., 33 * 33 and 65 * 65) and slices (i.e., 240 * 240) from Axial, Coronal and Sagittal views. At the testing step, the prediction results from 3 views is fused using a voting strategy. Moreover, these 3 networks took 12 days for training using a GPU implementation, and for testing, each model took for each view (i.e., Axial, Sagittal or Coronal) in average 3 minutes to segment the entire brain, i.e., 3 networks * 3 minutes equals to 9 minutes in addition to the fusion time which is not reported in the original paper.
[Mlynarski et al., 2019] developed a fully automatic GBM brain tumor segmentation method based on a combination of six models of 3D CNNs architectures. Each of these architectures is composed of three (or five) CNNs architectures, also, these architectures are trained independently and dedicated for one MRI view (e.g., Axial, Coronal or Sagit-tal slices of the input image) and one architecture based on 3D-CNNs. The proposed segmentation 3D CNNs model is trained on channels concatenation between extracted feature maps from the Axial, Coronal and Sagittal dedicated architectures and two (T2, T1c) or four (T2, FLAIR, T1, T1c) 3D multisequence MR images. The technique of using feature maps as an additional input into the architecture of another CNNs, is used by [Havaei et al., 2017]. Moreover, [Mlynarski et al., 2019] adressed two issues: the first one is long rang context, to solve this issue, they used 2D CNNs (its input is one of three views) to capture rich information through increasing the size of the receptive field. The second issue is unbalanced data, where they solved this issue by using weighted cross-entropy as a loss function.

READ Impact of sectoral agreements on creative destruction

Table of contents :

1 Introduction
1.1 Thesis Context
1.2 Thesis Motivation and Objectives
1.3 Thesis Contributions
1.4 Thesis Overview
2 Brain Tumor Segmentation
2.1 Introduction
2.2 Survey on MRI image segmentation
2.2.1 Classical approaches
2.2.2 Modern approaches
2.2.3 Discussion
2.3 Glioblastoma brain tumors
2.4 BRATS datasets
2.5 MRI image quality limitations
2.6 Evaluation metrics
2.7 Discussion and Conclusion
3 Background Theory: Convolutional Neural Networks
3.1 Introduction
3.2 Convolutional Neural Networks
3.2.1 CNNs operations
3.2.2 Forward Propagation
3.2.3 Backward propagation
3.2.4 Regularization Dropout
3.3 Discussion and Conclusion
4 Brain Tumor Segmentation with Deep Neural Networks
4.1 Introduction
4.2 End-to-End incremental Deep learning
4.2.1 Problem statement
4.2.2 Incremental XCNet Algorithm
4.2.3 ELOBA Algorithm
4.2.4 Experiments and Results
4.2.5 Discussion and conclusion
4.3 Deep Learning-based selective attention
4.3.1 Problem statement
4.3.2 Visual areas-based interconnected modules
4.3.3 Overlapping Patches
4.3.4 Class-Weighting technique
4.3.5 Experiments and Results
4.3.6 Discussion and Conclusion
5 Deep learning with unbalanced data and misclassified regions
5.1 Introduction
5.2 Deep learning with Online class-weighting
5.2.1 Problem statement
5.2.2 Online class-weighting approach
5.2.3 Experiments and Results
5.2.4 Discussion and Conclusion
5.3 Boosting performance using deep transfer learning approach
5.3.1 Problem statement
5.3.2 Methods
5.3.3 Experiments and Results
5.3.4 Discussion and Conclusion
6 Conclusion and perspectives
6.1 Summary of contributions
6.2 Perspectives
6.3 List of publications