Discriminative models learn the boundaries between classes without estimating class likelihood as illustrated in Figure 1.7b.
K-Nearest Neighbors (KNN) [Cover and Hart, 1967] is one of the simplest non-parametric discriminative classifier. KNN tends to construct the posterior class probability P r(Yc | x) without making any statistical assumption on class distributions. KNN finds the K-closest neighbors of a given vector x and uses a majority of voting to assign the class label. K is set as a positive integer that is usually small, e.g., between 1 to 7 are typical values. Cross-validation procedure can help to chose the optimal value that depends on the required complexity of the classification frontier. Note that the term ‘closest’ depends on the chosen distance, which is usually either the Euclidean distance, or, more frequently with spectral data, the Spectral Angle [Yuhas et al., 1992].
KNN naturally manages non-convex and non linearly separable classes but is how-ever relatively slow and requires to store the whole training samples for classifica-tion.
The Support Vector Machine (SVM) is a linear binary classifier that aims at finding the furthest separating hyperplan to the closest point in both classes directly in the feature space [Vapnik, 1998]. In SVM, class labels are noted Yi = ±1. The separating hyperplan HP 2 RP is defined by its normal vector w 2 RP and its bias b 2 R: wTx + b = 0, 8x 2 HP (1.23).
The Kernel Trick
Linear classifiers cannot, by definition, properly classify non-linearly separable classes directly in the feature space. In such cases, if the classifier only depends on dot products, it can benefit from the so-called Kernel Trick [Vapnik, 1998]. It consists in mapping the vector from the feature space to a higher dimensional space in which classes become linearly separable. The mapping is performed by a kernel function that has to respect Mercer conditions [Vapnik, 1998]. The two most used Kernels are:
• Polynomial: (x, x0) = x · x0 + c a.
• Gaussian: (x, x0) = exp x x0 k2 /(2 2).
where c, a, and correspond to the kernel parameters that have to be tuned. The great idea behind this Kernel Trick is that computations do not have to be made explicitly in the high dimensional space [Sch¨olkopf and Smola, 2002].
Training and assessing a classifier performance
When setting a classification model the question of complexity is a topic of major interest that has been discussed thoroughly in [Esbensen and Geladi, 2010]. The core idea is that increasing the complexity of a model by only observing the error made with the training data is prone to overfitting and has to be avoided. A good model should be complex enough to fit well the training set as well as be generic enough to classify accurately also an independent test set that would be acquired in the same experimental conditions. This is illustrated in Figure 1.8. As the model complexity increases better performances are obtained in both the training and the test set until a certain trade-o↵ complexity region is reached. In this region, a minimum error is obtained for the test set while the training set error keeps decreasing. This particular region corresponds to the best trade-o↵ between fitting to the training set and generalizability to unseen data. A more complex model loses in generalizability because the training set ‘noise’ is learned instead of real discriminatory features. In this figure, examples of classification boundaries are also given for a low, optimal and too complex models.
As it is often diﬃcult to get independent test samples, internal cross-validation (CV) is often used to assess the model complexity using only the training sam-ples. The simplest CV is to keep a part of the training set apart and use it as a validation set. Another technique, called leave-p-out is a CV technique for which p observations are left out to test the model that is build on the N p remaining training observations. This process is repeated with p other observations until all observations have been left out only once.
The CV procedure is useful to find the optimal model complexity and to tune its parameters. However, in order to provide an estimation of the predictive ability of the trained classifier on future samples acquired in the same experimental condi-tions, it is highly recommended to use an independent test set that has not been used yet.
Problems with the spectral dimension
There are several problems related to the use of spectral data for classification pur-poses, which are due to the fact that we try to model a low-dimensional ‘structure’ embedded in a high-dimensional space using only few observations. In practical applications, it is usually impossible to use generative classifiers because of the dif-ficulty associated to the statistical estimation of P r(x | Yc) as the dimension of x increases. Discriminative classifiers, although generally directly applicable to high dimensions, are a↵ected as well by the high dimensionality in terms of robustness because the space emptiness makes the class boundaries diﬃcult to learn.
The high dimensionality of spectral data is subjected to the so-called curse of dimensionality, first named by Bellman and Kalaba  to emphasize their dynamic search strategies for the estimation of multivariate functions. Bellman stated that, as the number of dimensions (P ) increases, the number of evaluations needed to estimate a function on a regular grid was correspondingly increasing to the power 2P . An illustration from Bishop’s book on pattern recognition [Bishop, 2007] illustrates this phenomenon on one to three dimensions (Figure 1.9). A trivial example, in which x is a Boolean vector of dimension 30, requires the estimation of more than 3 billion parameters [Tom M., 2005]. Typical HS clas-sification problems involve vectors of dimensions of more than a hundred. The estimation thus requires an amount of observations that is unmanageable for any possible application.
Using spatial information
Until now we have described classification methods applied directly on spectral data. For instance, using a so-called pixel-based or spectral classifier only treats the HS data as a list of spectral measurement without considering spatial re-lations of adjacent pixels, thus discarding important information. However, the classification results could be improved by using the contextual spatial information provided in the HS data in addition to the spectral information. As illustrated in Figure 1.14, depending on the acquisition scale, di↵erent sources of spectral variability are present within objects, which could be managed through spatial information. To this end, from the famous Extraction and Classification of Ho-mogeneous Objects (ECHO) method developed by Kettig and Landgrebe , a great deal of research have been carried out to find e↵ective spectral-spatial classifiers [Fauvel et al., 2013].
These methods, depending on what type of information is more discriminatory for the objects to classify, fall into three categories:
(1) If the objects to classify have strong spatial discriminatory features, these spa-tial features are extracted and then used to feed a classifier.
(2) If objects to classify have strong spectral and spatial discriminatory features, both are extracted and then used simultaneously in a classifier through kernel techniques.
(3) If objects to classify have strong spectral discriminatory features, spectral in-formation is first processed and the spatial pixels neighboring information is then used to enhance the classification results.
The two first approaches are usually employed to discriminate classes with a priori information on objects shapes or textures, e.g., buildings, houses, roads, row fields. On the contrary, the third approach only assumes a certain homogeneity in the spatial neighborhoods of pixels. In the next chapter, some successfully developed spectral-spatial approaches of these three categories are reviewed .
Dealing with the high-dimensionality of spectral data
Due to their ability to perform accurate and non-destructive measurements, hy-perspectral imaging devices have been increasingly used in many scientific and industrial fields over the last decades. Spectral data acquired by these devices are often composed of more than a hundred narrow bands which make the classical classification techniques fail. In practice, because spectral variables are also highly correlated (which can be observed looking at the smoothness of the spectra ob-served as a function of the wavelength), their dimension can be reduced without loosing important information [Geladi, 2003, Wold et al., 2001]. Therefore, most methods include a dimension reduction as a first processing step, which is usu-ally followed by a classical multivariate statistical method such as Multiple Linear Regression (MLR) when the responses are quantitative (concentrations) or Linear Discriminant Analysis when the responses are qualitative (classes) [Naes et al., 2002, Nocairi et al., 2005]. For classification, in the lower dimensional space, data are hoped to be well separated, i.e., small class spread and large distance between classes as represented in Figure 2.1.
Using spatial information: Spectral-spatial approaches
Every pixel-based classification method described in the first chapter usually per-forms well when the training set is representative enough and when classes to be discriminated are di↵erent enough in terms of spectral information. In other cases, in order to compensate for the lack of available spectral information, using spatial information provided by hyperspectral images has proved to be an impor-tant improvement [Dalla Mura et al., 2011, Gorretta et al., 2012, Tarabalka et al., 2010a]. Spectral-spatial methods for classification have had a short, but intense history and many papers have been published in the last decade, most of them being due to the remote sensing community [Bioucas-Dias et al., 2013, Fauvel et al., 2013]. These methods were originally classified into two families by Gorretta  as:
(1) Pixel-based classification with spatial constraints.
(2) Extension of classical image processing techniques to HS image: the main diﬃculty with this kind of method is to define a metric that makes sense in this high dimensional space and to create an ordering.
With the rapid development of new methods, this separation is now less obvious. It is preferred to define categories depending on the place where the spatial infor-mation is introduced in the classification chain, leading to three main categories [Bioucas-Dias et al., 2013]. A schematic view of these categories due to Valero  is represented in Figure 2.5.
Spatial information as an input parameter
In this approach, a feature vector that contains spatial information is constructed for each pixel. It can contain any contextual information such as: shape, texture, orientation, size… These features are usually extracted from the image using classical image processing techniques that have either been adapted to work in higher dimensions or applied on a spectrally reduced image.
The first spectral-spatial classification method, originally developed for multi-spectral images, is the well-known ECHO (Extraction and classification of ho-mogeneous objects) [Kettig and Landgrebe, 1976, Landgrebe, 1980]. With this method, the image is first segmented into homogeneous regions that are found us-ing a recursive partitioning, i.e., 1) The image is partitioned into small rectangular regions of pre-defined sizes; 2) Adjacent regions that are similar enough accord-ing to an homogeneity criterion are merged 3) Step 2 is repeated until no more xTy kxk·kyk merging is possible. Each segmented region is finally classified using a classical maximum likelihood classifier.
Table of contents :
1 Introduction to hyperspectral image classification
1.1 Hyperspectral imaging
1.1.1 Light-matter interaction
1.2 Supervised classification
1.2.1 Definitions and hypotheses
1.2.2 Generative classifiers
1.2.3 Discriminative model
1.2.5 The Kernel Trick
1.2.6 Training and assessing a classifier performance
1.3 Classification issues with HS data
1.3.1 Problems with the spectral dimension
1.3.2 Using spatial information
1.3.3 Obtaining reflectance images
2.2 Dealing with the high-dimensionality of spectral data
2.2.1 Unsupervised approaches
2.2.2 PLS-like approaches
2.2.3 FDA-like approaches
2.3 Using spatial information: Spectral-spatial approaches
2.3.1 Spatial information as an input parameter
2.3.2 Spatial information at the classification decision stage
2.3.3 Spatial information as a post-processing stage
2.4 Reflectance correction
2.4.2 Physics-based transfer (model-based) correction
2.4.3 Scene-based correction
2.4.4 Image-based correction
3 Proposed approaches
3.2 Dimension reduction
3.2.2 Subspace decomposition: problem statement
3.2.3 Variability decomposition in RN and RP
3.2.5 DROP-D algorithm
3.3.1 Construction of the score image
3.3.2 Anisotropic regularization
3.3.3 Score image regularization
3.4 Reflectance correction
3.4.2 Lambertian hypothesis
3.4.3 Discrimination model hypothesis
3.4.4 Problem statement
3.4.5 Translation estimation
4 Experimental Results
4.1 Data sets
4.1.1 Data set A: Proximal detection
4.1.2 Data set B: Remote-sensing
4.1.3 Performance measurements
4.2 Dimension reduction
4.2.1 Collinearity in RP
4.2.2 E↵ect of removing W on the class separability
4.2.3 Model calibration
4.2.4 Classification performances
4.3 Spatial regularization
4.3.1 Validation of the approach
184.108.40.206 On ‘what’ to apply the regularization
220.127.116.11 ‘When’ to apply regularization
4.3.2 Tuning robustness
4.3.3 E↵ect on score versus spatial
4.3.4 Classification results
4.3.5 Comparison with other approaches
4.4 Reflectance correction
4.4.1 Reflectance correction e↵ect on the reduced scores
4.4.2 Using log-radiance image for classification
4.4.3 Translation estimation
5 Conclusions and future work
5.2 Future work
B R´esum´e ´Etendu
B.1 Classification en imagerie hyperspectrale
B.2 Probl´ematiques et ´etat de l’art en classification des donn´ees HS
B.2.1 Probl`emes avec la dimension spectrale
B.2.2 Utilisation de l’information spatiale
B.2.3 Obtention d’images en r´eflectance
B.3 Approches propos´ees
B.3.2 R´eduction de dimension
B.3.3 Approche spectro-spatiale
B.3.4 Correction en r´eflectance
B.4.1 R´eduction de dimension
B.4.2 R´egularisation spatiale
B.4.3 Correction en r´eflectance