Leave-One-Out Cross-Validation versus k-fold Cross-Validation 

Get Complete Project Material File(s) Now! »

Recursive Feature Elimination (RFE)

The L2 nonlinear SVM are good classifiers but do not allow to select variables. However, a method has been proposed [12] to use the nonlinear SVM to select variables. The objective is to remove the variables that will change the less the orientation of the hyperplane while preserving good accuracy rate (see figure 6). Removing variables makes the problem easier to solve, however, this choice is delicate because it can lead to poor results if the wrong variable is removed as it is shown in figure 6.

Projection in a polynomial space of degree d

As it has been explained previously, it could be interesting to project the data into a space where a linear separation correspond to a nonlinear separation in the original fea-ture space. An interesting projective space is a polynomial space because the projection is easy to calculate. When the data are projected, a L1-regularized linear SVM is used. It is more interesting than a L2-regularized SVM with polynomial kernel because the vector w, which gives the variables or combination of variables characterizing the hyperplane, will be explicitly available. The projection function is such that RL!H x 7! (x).
where H is the projected space. The dimension of H can be easily found; dim( H) = L+Ld . When L is big (it is the case for hyperspectral images), the dimension of H is huge. Therefore, we will use here the projection in a polynomial space of degree 2 to reduce the number of variables in the projected space. The pixel x is such as x = [x1; x2; : : : ; xL]T .

Nonlinear parsimonious feature selection (NPFS)

One limitation of the SVM is that it can not do directly multiclass separation and its computation time. Instead, a separation is made between one class and the others for each class. Thus, the interpretation of the selected features is difficult. Furthermore, the feature selection from the SVM are not very efficient (see section 3).
In this section, we propose to use multiclass separation using Gaussian Mixture Model. Therefore, the yi 2 [1; : : : ; C]i=1;:::;n, are now the label of the class, C the number of classes and nc the number of training pixels in class c (PCc=1 nc = n).
The forward feature selection works as follow. It starts with an empty pool of selected features. At each step, the feature that improves the most the k-fold cross validation (k-CV) estimation of the correct classification rate is added to the pool. The algorithm stops either if the increase of the k-CV is too low or if the maximum number of features is reached. The main contribution of this method is the fast update of the GMM and the fast update using the marginalization of the GMM.

Computation time

The main difference between the two algorithms is on the computation time; the k-CV is faster than the LOOCV. This is due to the number of time the model need to be updated; for the LOOCV, the model need to be updated n times for one feature removal (where n is the number of samples) while it must be updated only k times for one feature removal for the k-CV (in general, k is a small integer). For the University of Pavia data set with nc = 200, the k-CV takes on average 2.9 seconds to select features while the LOOCV takes on average 34.3 seconds to select features. For the Hekla data set with nc = 200, the k-CV takes on average 8.6 seconds to select features while the LOOCV takes on average 100 seconds to select features.
During this internship, traditional sparse classification methods such as SVM have been applied on hyperspectral images. Based on these methods, some algorithms have been in-vestigated to select features, without great success. A new method of feature selection has been developed; the LOOCV NPFS. It has been tested, improved and established itself as a good sparse method to classify non linearly hyperspectral images and select features. In-deed, the linear SVM has poor results and select too many features, even when the data are projected into a polynomial space. The linear SVM has reasonable computation time when it is applied on the original data but the computation time explode when the linear SVM is applied to projected data. The nonlinear SVM has the greatest classification accuracy but does not allow to select features and has important computation time. When a RFE is ap-plied to the nonlinear SVM, the number of remaining feature is still big and computation time are bigger. The simple forward algorithms are the most adapted to the problematic of the internship; is has a good classification rate, select few features and has small com-putation time. The extension of the LOOCV to k-CV also allows to treat data with a lot of samples with better accuracy. For better results, the LOOCV can be used when few samples are available and the k-CV can be used when a lot of samples are available.
Further work could be the improvement of the simple forward algorithms with a back-ward step to ensure that the selected features are are the most pertinent. A method to remove the variability due to the high correlation between adjacent bands. A continuous interval selection strategy, such as in [8], could be implemented. Other study could be done to take into account the spatial information. A further work can be done on the study of the extracted bands; indeed, we can see in figure 14 that the red-edge (between 700nm and 750nm) is extracted. It is a marker of plant activity [23]. It would be also interesting to apply the algorithms presented in this internship to multi-temporal data, and interpret the extracted dates.
For me, this internship has been a good experience; it has allowed me to improve my knowledge on classification methods and how to apply them on hyperspectral images. I also learned a lot about sparse models and sparse methods for feature selection. This in-ternship strengthened my liking for the research in remote sensing and will be helpful for the doctoral thesis about the extraction of forest characteristics by joint analysis of multi-spectral or hyperspectral imaging and 3D LIDAR data I will make for the next three years.


Table of contents :

1 State of the art 
2 Sparse classification methods 
2.1 Support Vector Machine (SVM)
2.1.1 L2 nonlinear
2.1.2 Recursive Feature Elimination (RFE)
2.1.3 L1 linear
2.1.4 Projection in a polynomial space of degree d
2.2 Nonlinear parsimonious feature selection (NPFS)
2.2.1 Gaussian mixture model (GMM)
2.2.2 k-fold cross-validation
2.2.3 Updating the model
2.2.4 Marginalization of Gaussian distribution
2.2.5 Leave-one-out cross-validation
3 Experimental results 
3.1 Data set
3.2 Results on synthetic data set
3.3 Results on real data set
3.3.1 Accuracy
3.3.2 Selected features
3.3.3 Computation time
3.3.4 Classification maps
4 Leave-One-Out Cross-Validation versus k-fold Cross-Validation 
4.1 Accuracy
4.2 Selected features
4.3 Computation time
A Data resizing 
A.1 No rescale
A.2 Rescale between 0 and 1
A.3 Rescale between 􀀀1 and 1
A.4 Standardize
A.5 Effect of the resizing
B Simple forward feature selection 
B.1 Notations
B.2 Estimators
B.2.1 Proportion
B.2.2 Mean vector
B.2.3 Covariance matrix
B.2.4 Decision rule
B.3 Updating formulas
B.3.1 Proportion
B.3.2 Mean estimation
B.3.3 Covariance matrix estimation
B.4 LOOCV algorithm
C Figures


Related Posts