Remote sensing from space
Satellites have been capturing images of the Earth since NASA’s TIROS (Television Infrared Observation Satel-lite) was used to monitor weather patterns in 1960 [Tatem et al., 2008]. Today, hundreds of Earth Observation satellites orbit our planet, gathering a variety of information regarding many aspects of the Earth’s surface and atmosphere.
The most common Earth Observation satellites bear passive sensors, which measure the light reflected or emit-ted by the surface. Among these are optical sensors, which are very often designed for the visualization of images, by capturing either the overall luminance (gray-scale image), or the Red, Green and Blue wavelengths of the electro-magnetic (EM) spectrum for color images. These sensors are in fact able to measure many diﬀerent wavelengths of the spectrum, ranging from the near ultra-violet to the infrared portions of the spectrum. Optical sensors mea-sure the sunlight reflected by the surface, and therefore can only be taken during the day, and are often obstructed by clouds. Famous optical Earth Observation satellites include the Landsats 1-8 [Markham and Helder, 2012], Sentinels 2 and 3 [Drusch et al., 2012], shown in Figure 1.2 and the Système Probatoire d’Observation de la Terre family, more commonly known as SPOT (1-7) [Chevrel et al., 1981]. Chapter 2 explains how these sensors are able to capture images at a global scale.
Other passive sensors measure the light emitted by the Earth in thermal infrared wavelengths (9-14 µm), rather than the light reflected from the Sun. This is useful to estimate surface temperatures which are linked to the amount of radiation in this portion of the spectrum. An example of such a satellite is the ASTER imager [Abrams, 2000].
Another technology involves actively scanning the surface of the Earth using a Radio Detecting And Ranging system, more commonly known as a Radar. These satellites emit signals in the radio wavelengths of the EM spectrum, which pass through clouds, but scatter oﬀ of hard surfaces and return back to the sensor a brief moment later. By measuring the time diﬀerence between emission and reception, and the polarization of the received wave, information regarding the nature of the surface can be derived, like roughness, topography, and the presence of tall, vertical structures. Images from sensors like ASAR aboard ENVISAT [Arnaud et al., 2003] have been used in tandem with optical images for characterizing the presence of irrigation in fields [Hadria et al., 2009].
Optical images have four main characteristics: spatial resolution, spectral resolution, temporal resolution, and covered area.
• Spatial resolution defines the physical distance between two points that are distinguishable in the image, not to be confused with the ground sampling distance, which defines the area covered by one pixel in the image. If the image is visualized at full resolution, these two have the same value, but images are often sampled with larger pixel sizes to avoid aliasing eﬀects, in particular when applying transformations such as rotations. High Spatial Resolution (HSR) images like the ones Landsat-8 and Sentinel-2 have a spatial resolution of 10-100m, and are not to be confused with Very High Spatial Resolution (VHSR) images, which have spatial resolutions ranging from 0.1m-10m, like the SPOT-7 images. In general, the diﬀerent wavelengths of an optical image are captured at diﬀerent spatial resolutions, due to the technical constraints linked to the use of infrared sensors.
• Spectral resolution or spectral range specifies the wavelengths of light which are acquired by the imaging sensor. Each band is defined with one central wavelength and a spectral width, which represents how fine the band is. In order to distinguish the previously mentioned agricultural, natural, and artificial classes, many land cover maps are based on images that have been captured in several spectral wavelengths. These contain rich information beyond the visible light that our eyes perceive. For example, vegetation reflects strongly in the infrared, unlike artificial surfaces, shadows, and water. Figure 1.3 shows SPOT7 imagery over an area in Brittany to illustrate the strong contrasts between the diﬀerent land cover types that can be seen in multi-spectral images. By mapping the Near-Infrared, Red, and Green bands on the RGB channels of the image, vegetation appears in shades of red. Various materials also exhibit unique spectral patterns, which can provide extremely valuable information for characterizing them. Sensors like Landsat, SPOT, and Sentinel-2 capture multi-spectral images, both in the visible and infrared (450-2200nm). Hyper-spectral sensors like AVIRIS and Hymap capture images with a higher spectral resolution, and sample several hundreds of wavelengths [Teke et al., 2013].
• The revisit time or temporal resolution defines the rate at which the images are acquired through time. Imag-ing satellites orbit the Earth many times per day, and by precisely adjusting their trajectory, Earth Observation constellations like Sentinel-2A&B are able to capture an image of any given place every 5 days. This allows us to follow the temporal evolution of the land over a yearly cycle. We can see crops being planted and harvested, trees losing their leaves in the autumn, and snow falling on the mountainous areas. Such seasonal behavior is characteristic of certain land cover classes, and allows us to identify them by using observations throughout the year. An example of Sentinel-2 images over an area near Toulouse is provided in Figure 1.4, which illustrates the seasonal crop cycle of Winter and Summer crops.
• The covered area describes the location and footprint of the surface covered by the image or time series. Satellite optical images used for the production of land cover maps are often mosaics of images acquired at diﬀerent dates, that have been blended together to create a coherent image covering a large area. For instance, the sensor aboard the Sentinel-2 satellites has as swath width of 290km (width of the area covered at each passage of the satellite), yet it produces time series covering the emerged surfaces of all latitudes between 56°S and 84°N, with a 10 day revisit time, or a 5 day time with both Sentinel-2A and 2B. Various resampling techniques are used to create homogeneous images or times series. This process, known as temporal resampling is described in detail in Part II, Section 3.2. The diﬀerently colored area near the top left corner of figure 1.4c is linked to the stitching of images acquired at diﬀerent dates.
The requirements on these four characteristics depend on the target land cover classes. For example, a 10-100m spatial resolution is considered suﬃcient for most agricultural and natural classes. However, urban classes certainly benefit from higher spatial resolution, as the smallest objects that make them up range from 1-10m (narrow streets, canals, individual trees). On the other hand, agricultural classes require a frequent revisit time, as the diﬀerent stages of the phenological cycle are all necessary to discriminate diﬀerent crop types, as is shown in Figure 1.4.
Optical images are the main focus of the research work led in this Ph.D, as they are well known to provide valuable information for land cover mapping [Inglada et al., 2017,Gómez et al., 2016]. Moreover, the high quality, open availability, and mission lifetime of Sentinel-2 encourages the use of optical information. However, it is important to mention that none of the methods designed and proposed during this Ph.D. are in any way limited to optical imagery.
A classification task
The task of assigning categorical labels to each pixel in an image is called classification, image classification or dense classification in the Remote Sensing community. Interestingly, in the Computer Vision community, this same task is called semantic segmentation, as it consists in splitting (segmenting) the image into areas with semantic labels. The term classification or image classification is used when one label is assigned to an entire image to describe its content, for example saying « this is a picture of a house ». Dense classification is quite diﬀerent, as it involves determining precisely which pixels belong to the house and which belong to the sky, road, and background. To avoid confusion, the Remote Sensing terminology is used in the rest of the manuscript.
Creating a land cover map based on satellite imagery involves assigning a class label to each pixel in the image. This is done according to the image features, which depend on the type of imagery used. For example, they can be the multi-spectral channels of an optical image, the backscatter coeﬃcients of a Synthetic Aperture Radar (SAR) image, a time series describing the temporal behavior of the pixel, altitude or height information coming from topographical measurements, or a combination of several of these. If multiple heterogeneous sources of information are used, the term multi-modal or multi-source data is employed.
Many land cover maps are made by human photo-interpretation, in other words, by an expert visualizing the images, and distributing the diﬀerent classes across the area by hand. For instance, the Corine Land Cover (CLC) map [Bossard et al., 2000], illustrated in Figure 1.5 is produced in this way every 6 years. This map covers the entire European continent, and contains a very detailed nomenclature, particularly in the vegetation classes.
(a) Sentinel-2 image of January 2016. In the beginning of the year, crops and natural grasslands can easily be confused, as most agricultural land contains small amounts of vegetation during this period.
(b) Sentinel-2 image of May 2016. In the spring, winter crops reach the height of their growth phase, the land meant for summer crops is ploughed to bare soil.
(c) Sentinel-2 Image taken in August of 2016. Between May and August, the winter crops are harvested and prepared for the next cultural year, while summer crops are starting to grow.
(d) Sentinel-2 image of November 2016. By autumn, the summer crops have been harvested as well, and the only vis-ible vegetation are forests, which usually contain a mix of broad-leaved and coniferous species.
Photo-interpreters make use of satellite images as a base for mapping land cover, but they also incorporate a degree of knowledge from external sources, in other words, information that is not present in the pixels themselves. Factors like the yearly climate, the geographical area, prior knowledge of the classes and their geometric layouts, and many more guide the expert decision process.
Photo-interpretation of a satellite image on a wide area is a slow and costly process, as it involves assigning a unique class label to a very large number of image elements. If the pixels of a high spatial resolution (10m) image were to be labeled one by one, mapping an area the size of France would mean labeling over 5 billion pixels. This illustrates the scale of the problem at hand, and is part of the reason why the CLC map is made with a Minimal Mapping Unit (MMU) of 25ha (500m×500m). While this MMU is suﬃcient for many land cover classes, in particular agricultural ones, other classes are impossible to describe at such a rough scale. Thin or narrow elements like streets, streams, canals and hedges, are absent from maps with this MMU, because they split larger objects in small areas (inferior to the MMU), and occupy a limited area themselves. The same can be said about isolated elements like lone houses in rural landscapes, which are absent from the CLC map, shown in Figure 1.5.
A set of decision rules
One way to classify the pixels of an image is to manually construct a set of decision rules, in an approach known as an ontology-based classification [Comber et al., 2005]. This translates preconceived knowledge of the diﬀerent classes into conditional statements regarding the features of a pixel and the class it should be attributed to. For instance, the following statement, « If a pixel is dark all year long, it must be water », can be converted into a rule separating the two water and non-water classes, based on a threshold on the feature values (dark or not) across the time series. Then, the non-water class can be separated again, by a rule like « If the pixel contains vegetation in the summer, and looks like bare soil in the winter, it must be a summer crop », and so on. This forms what is known as a decision tree.
By basing the decision tree on a hierarchical system of classes, called the ontology, our natural interpretation of the main land cover elements can be made into an automatic procedure, which can be applied to any pixel. Then, the simple programming of such a tree allows a machine to rapidly process very large amounts of data. A few examples of applications of such methods in land cover mapping include works from [Comber et al., 2004] and [Belgiu et al., 2013]. Figure 1.6 shows a possible tree-like structure describing the diﬀerent hierarchical relations between land cover classes, which is based on the Corine Land Cover nomenclature. As this figure shows, providing an extensive description of the numerous land cover elements requires a significant amount of branches. To convert such a hierarchical description into a decision tree, each branch needs to be associated to one decision rule, with manually defined thresholds based on the image features.
While this may be a good first approach for a simple two or three class classification problem, the complexity of land cover mapping with rich nomenclatures makes such decision trees very diﬃcult to design at all. Indeed, time series contain a large amount of spectral, spatial, and temporal variability, which makes decision rules based on thresholds on the features diﬃcult to determine. First of all, according to the geographical region, the behavior of land cover classes can be quite diﬀerent. For vegetation classes, this is linked to the ecology and the climate of the area, which varies greatly across a large territory. In the case of urban classes, the aspect of diﬀerent cities is rarely the same, as it usually depends on the availability of local construction materials. Second of all, the weather diﬀers every year, which causes variability in the agricultural and vegetation classes, and would be diﬃcult to take into account in such decision rules. For these reasons, a unique decision tree would need to be constructed for each region across the country, and once again every year.
Another reason that makes the ontology-based classification approach unsuited for such a problem is the re-quirement of large number of manually set thresholds. For simple problems, the precise value of the thresholds has a relatively low importance, but for such complex problems, tuning the value of the thresholds can undoubtedly al-low for a more precise classification. Manually finding the ideal thresholds is nearly impossible, yet even untrained people are able to recognize many of the most challenging target classes. This raises an interesting question: How is it that we can easily perform certain tasks while having no idea of how we are actually doing them ?
A CLASSIFICATION TASK
In many cases, understanding how we make decisions is more diﬃcult than actually making the decisions themselves. We can easily perform classification tasks that we are not truly able to describe in words, based on what might be called « experience » by some or « instinct » by others. For instance, distinguishing between diﬀerent smells, or between the voices of people we know. Naturally, these all have physical origins which diﬀerentiate them, but our ability to classify them is instantaneous and relatively reliable, while our knowledge of the physics behind these phenomena may be limited or nonexistent.
We don’t exactly understand (or need to understand) the decision process involved with each classification task we perform in our every day lives. Nonetheless, it is possible to use our knowledge of the way we learn how to perform these tasks eﬃciently, to inspire the design of an automatic classification system. Most of what we learn comes from our past experiences, which can be seen as a series of examples from which to optimize our decision process. By smelling hundreds of diﬀerent roses and lavender flowers, or by listening to other peoples voices for enough time, one can learn to tell them apart, as each one produces a unique sensation registered somewhere in the brain. In the same way, machines can be taught to recognize diﬀerent things, by learning from not hundreds but millions of diﬀerent examples, in a process known as machine learning.
Supervised classification with machine learning
One very eﬃcient way of generating land cover maps is to use a supervised classification method. This involves teaching an algorithm to automatically classify the various elements present in a data set. The basic idea is to automatically devise a decision process based on a set of already labeled examples, using a learning algorithm. These algorithms are also commonly known as classifiers. Their lifetime is divided in three main stages: training, testing, and prediction, which are represented in Figure 1.7.
During the training stage, the machine learns to recognize the classes on its own by observing data from a certain number of examples: the so-called training data set. These labeled samples are the basis from which the algorithm can categorize previously unseen data points. By analyzing the common points and diﬀerences between the features of tens of thousands of examples, the algorithm is able to establish a model, which aims to assign class labels to points described by these same features. In other words, training involves dividing the feature space in as many regions as there are classes, in order to later make a decision regarding the class label of an unknown pixel. The training data set is a fundamental aspect of the classification process as it contains observed instances of the natural phenomena that link the target classes, which are basically categorical concepts, to the features, which are specific measurements with real values.
The underlying objective of a supervised classification method is to generalize, in other words, to accurately predict the class of samples that are not present in the training data set. In a way, training a classifier involves transferring the information that is present in the labels of the training data set into a decision process, often called the model, which can assign labels to unknown samples. This is why the process is often named learning, as it involves first accumulating a large amount of individual observations, and transforming these observations into a decision process.
The term supervised in supervised classification comes from the fact that labeled training data points are used to define the target classes, and to train the model. Inversely, unsupervised classification, also known as clustering attempts to create groups of similar points in the data set, called clusters, without using prior knowledge of their class labels. Clustering can be useful for many image analysis purposes, for example, for identifying outliers (points which are very dissimilar from the others), or for compressing images. These are often used as a data analysis tool when no labeled data is available. Unsupervised classification alone is not suﬃcient for producing a land cover map as it only provides a cluster label for each pixel, which does not tell us its class label, in other words, if its a forest, a road, a field, etc. On the other hand, if an insuﬃcient amount of training data is available, it is possible to perform clustering on the data and then label the clusters in a successive step.
Naturally, the quantity and quality of training data given to the algorithm is a very important factor for its success. The training data should provide a realistic representation of the various classes. It should be as com-prehensive as possible, in order to account for variability within the data set. In land cover mapping, training data from several hundreds of diﬀerent areas across the territory is used to account for geographical variability. More-over, training labels should be as up-to-date as possible, in order to avoid falsely labeled points due to land cover changes.
The second stage involves evaluating how well the classifier has learned, and is called the testing stage. In practice, rather than training the classifier with all of the available labeled samples, a part of them are set aside in order to evaluate its performance. These form the so-called test data set, and provide valuable insights on how well the classifier is able to recognize diﬀerent classes.
In multi-class classification problems, there are several ways of evaluating the performance of a classifier. The most commonly used tool is the confusion matrix. Each element cij of this matrix contains the number of elements attributed to the class j by the classifier, and that are truly the class i. The diagonal elements show the number of correct classifications, and the oﬀ-diagonals represent a confusion between two classes. In classification problems with a large number of classes, the confusion matrix becomes diﬃcult to analyze eﬃciently, so a number of average performance scores are derived from the matrix. The most commonly used scores are the Overall Accuracy, which calculates the average accuracy over all classes, and the class F-scores which provide an indication of the recognition rate of each class. The precise definition of these scores is given in Part II, Section 6, page 87.
In practice, classifiers are trained a great number of times with diﬀerent parameters, and in some cases, several diﬀerent classifiers are trained on the same data set, and participate in a voting system, which often increases the likelihood of a correct decision. This is known as an ensemble classification system [Rokach, 2009].
It is a commonly accepted notion that the training and test sets should be as independent as possible, in order to provide an unbiased evaluation of the performance of the classifier. Unfortunately, obtaining labeled data is one of the major diﬃculties in using a supervised classification approach, and the same data sources are often used for training and testing. This implies that biases in data collection can be reflected in the performance scores.
However, if the validation data set is representative of reality, in the sense that it encompasses the vast majority of cases that the classifier is likely to encounter during its prediction phase, the validation scores do provide accurate estimates of the final performance of the classifier. In other words, the completeness of the test data set is more important than how correlated it may be to the training data set. The implications of this point regarding the validation of land cover maps is discussed further in Part II, Section 3.1.2.
The third and final stage of a classifier’s lifetime is the prediction phase, if the testing has shown suﬃcient performance scores. This involves classifying a very large number of unlabeled samples, and is the moment the classifier truly becomes useful, as it is able to perform such operations at great speeds. The following section describes a land cover map of France known as the OSO map, which has been produced every year since 2016, and the issues that it faces in classifying certain land cover types.
Definition of the problem and main research objectives
The OSO map
Many land cover applications use time series of multi-spectral satellite images covering a period of approxi-mately one year, as this covers a phenological cycle for many of the land cover classes. The Occupation des SOls (OSO) [Inglada et al., 2017] land cover map describes the content of France with a yearly update. The most recent maps (2017 and 2018) have been made by using time series of Sentinel-2 images, at a 10m target spatial resolution (MMU of 0.1ha). Figure 1.8 provides a view of the evolution of the OSO products from 2014 to 2019. The colors corresponding to the diﬀerent classes can be found in Figure 1.9, along with a close-up of the 2017 map. Note the fine grained detail brought by a 10m target spatial resolution in figure 1.9b. The production of this map is used as a baseline case for many of the experiments in Part IV.
The OSO Land Cover map uses a combination of the previously mentioned Corine Land Cover (CLC) [Bossard et al., 2000], for the vegetation and urban classes, as well as the Land Parcel Information System (Registre Par-cellaire Graphique or RPG) [Cantelaube and Carles, 2014], which describes the main crop classes [Inglada et al., 2017], and the Randolph Glacier Inventory (RGI) [Pfeﬀer et al., 2014]. The detailed description of the OSO classes and their data sources is given in Chapter 3, Section 3.1.1.
Practically speaking, these data bases come under the form of labeled polygons, which each represent a geo-graphic entity, like a field, a road, or a neighborhood. Figure 1.9a provides an illustration of the training data used by the OSO map, over an area of 10km × 10km, along with the OSO map of the corresponding area.
The map of 2018 was produced with an extended nomenclature containing 23 classes. The most recent ex-tensions provide a more detailed description of the annual agricultural classes. Annual Summer Crops (ASC) are divided into 5 classes that describe the plant species: Soybean, Sunflower, Corn, Rice, and Root/tuber. The An-nual Winter Crops (AWC) is split into 3 classes: Rapeseed, Straw cereals, and Protein crops. This extension is also planned for the map of 2019.
The OSO Land Cover Map uses a supervised classification method known as Random Forest [Breiman, 2001], which is described in detail in Section 3.4, in order to assign a class label to each pixel. For this purpose, time series of Sentinel-2 images are used to provide a multi-spectral description of each pixel, at various dates throughout the year. This allows the major natural, agricultural, and artificial classes to be distinguished. The detailed OSO class nomenclature is given in Table 1.1, along with the source used for the production of the 2016 OSO land cover map.
It can be noted that the CLC classes, which are only produced every 6 years, are used every year as labeled data for the production. These classes are considered to be relatively perennial, in the sense that major updates take place on a scale of several years, and are not subject to yearly changes or rotations like crops. That being said, this causes several mislabeled samples in the data base, the impacts of which are studied in detail in [Pelletier et al., 2017,Tardy et al., 2017]. In order to produce an updated land cover map, every single pixel of the image is therefore classified, including the ones which are already labeled in the reference data set. If the classifier has learned suﬃciently well, it is even able to correct a certain number of the mislabeled samples. By analyzing disagreements between the map and the reference data, it is possible to detect and locate certain land cover changes.
Figure 1.10 shows the confusion matrix of the OSO map that was produced in 2016. The OSO method allows for high recognition rates of the major Annual Crops (AC), and Intensive Grasslands (IGL). Indeed, the multi-temporal information allows these classes to be relatively easily distinguished, due to the diﬀerences in the periods of seeding, growth, and harvest. Moreover, Broad-Leaved Forests, Coniferous Forests, Bare Rocks, Water and Glaciers are recognized quite well.
However, there are high rates of confusion between the diﬀerent artificial classes (CUF, DUF, ICU and RSF). Other poorly recognized classes include the Woody Moorlands, Natural Grasslands, and Orchards. Many of these confusions can be linked to out-of-date or ill-defined training data, with large polygons containing a mix of diﬀerent classes. However, this should not be the case for the four urban classes. As was mentioned earlier, cities evolve relatively slowly over time, and a suﬃcient number of labeled samples is available for training in the CLC, which was used as a data source for the 2016 OSO map.
In fact, the Urban Atlas (UA) database [Montero et al., 2014] provides a geometrically accurate description of the various artificial cover types, and in some cases, their density, on all cities with over 100,000 inhabitants. Unlike agricultural classes, urban classes are mostly perennial, and can be used on images from diﬀerent dates with a limited amount of errors, as the construction or destruction of urban cover usually happens over several years. This implies that from one year to the next, the majority of built-up classes do not change. However, this mapping is only made on major cities, and is not updated every year. The impact of the integration of UA classes in the OSO production scheme is discussed in Part II, Chapter 1.3, but changing the data base does unfortunately not entirely solve the issue of confusions between urban classes in the land cover maps.
In a similar way, forests and shrublands are often confused with one another, as the diﬀerence between the two classes lies more in the density of tree cover than on the aspect of each tree, especially seen from above. Table 1.2 shows a few examples of classes that describe the density of tree cover.
The amount or quality of training data for these classes is not the reason behind the poor recognition of density-based classes.
Table of contents :
I Introduction to land cover maps
1 The operational production of land cover maps
1.1 Land cover maps
1.2 Remote sensing from space
1.3 A classification task
1.3.2 A set of decision rules
1.3.3 Supervised classification with machine learning
1.4 Definition of the problem and main research objectives
1.4.1 The OSO map
1.4.2 The importance of context in high-resolution image classification
1.4.3 Challenges of a large scale production
1.4.4 Objectives and scope
2 Operational optical imaging systems for land cover mapping at a global scale
2.1 Properties of the Sentinel-2 constellation
2.2 Properties of SPOT-7
3 Production of the OSO land cover map
3.1 Reference data and sample selection
3.1.1 Data sources
3.1.2 Split of training and evaluation sets
3.2 Cloud-filling and temporal interpolation
3.3 Feature extraction
3.4 Details of the supervised learning algorithm: Random Forest
3.4.1 Purity criteria
3.4.2 Ensemble methods
3.5 The final prediction phase
3.5.1 Eco-climatic stratification
3.5.2 Tile-based classification and mosaicking
II Basics of contextual classification
4 Defining the spatial support
4.1 Sliding windows
4.2 Objects from an image segmentation
4.2.1 Object Based Image Analysis (OBIA)
4.2.2 Mean Shift segmentation algorithm
4.4 Multi-scale representations
4.5 Overview of the spatial supports
5 Contextual features
5.1 Isotropic features
5.1.1 Local statistics: the sample mean and variance
5.1.2 Structured texture filters
5.2 Oriented texture filters
5.2.1 Describing oriented repeatability
5.2.2 Local binary patterns
5.3 Key-point based methods
5.4 Level set methods
5.5 Shape features
6 Evaluation of land cover maps
6.1 Class accuracy metrics
6.2 Standard geometric quality metrics
6.3 Pixel Based Corner Match
6.3.1 Corner detection
6.3.2 Corner matching
6.3.3 Impact of regularization
6.3.4 Calibration of the metric
6.3.5 Further validation with dense reference data
III Advanced contextual classification
7 Scaling the spatial supports
7.1 Application of Mean Shift to large images
7.2 Scaling the SLIC superpixel algorithm
7.2.1 Segmentation quality criteria
7.2.2 Tile-wise processing procedure
7.2.3 Parallel processing
7.2.4 Estimating the optimal tiling parameters
7.2.5 Experimental results
7.2.6 Overview and validation
8 Stacked contextual classification methods
8.1 Using the prediction of nearby pixels
8.1.1 Bag of Visual Words
8.1.2 Random Fields
8.1.3 Stacked classifiers
8.1.4 Semantic Texton Forests
8.1.6 Summary of the literature
8.2 Histogram of Auto-Context Classes in Superpixels
8.2.1 Principle of HACCS
8.3 Basic Semantic Texton Forest
8.4 Overview with regards to operational land cover mapping
9 Deep Learning on images with Convolutional Neural Networks
9.1 What is Deep Learning ?
9.1.1 The Neural Network, a connected group of simple neurons
9.1.2 Convolutional Neural Networks
9.2 Deep Learning for land cover mapping
9.2.1 Patch-based network
9.2.2 Fully Convolutional Networks
9.2.3 Issues with sparse data
10 Multispectral time series experiments on Sentinel-2 images
10.1 Experimental setup
10.2 Results of image-based contextual features
10.2.1 Experiments on T31TCJ
10.2.2 Experiments on the 11 tiles
10.2.3 Overview of the results
10.3 Results of semantic contextual features
10.3.1 Experiments on T31TCJ
10.3.2 Experiments on the 11 tiles
10.3.3 Overview of the results
11 Mono-date VHSR experiments
11.1 Experimental setup
12.1 The importance of contextual information
12.2 Different ways of including context
12.3 Overview of the experimental results