SVM for multi-class classification
We place ourselves in the context of discrimination in C categories with C > 3 .
The principle of the SVM explained in the previous section is summarised in solving of binary classification problems, but the most classification problems are a multi-class problem. Hence the importance of extending the principle of SVM to the problems of more than two classes, there have been several attempts to combine binary classifiers to identify this problem (multi classes) , there are also attempts to incorporate the classification of several classes into the SVM process so that all classes are treated simultaneously .
In this section, we will discuss on strategies based on reducing the multi-class problem to multiple binary classification problems. We will then briefly explain some of the most widely used methods.
The combination of SVM bi-classes: a first step towards multi-class SVM
Decomposition methods can be used to address a multi-category discrimination problem (C > 3) as a combination of dichotomous calculation problems. We are dealing here only with the two main decomposition methods.
One versus All
We are given a training dataset of n points of the form (x1; y1); ; (xn; yn) where xi, i = 1; ; n is a vector of length d and yi 2 Y = f1; ; C g representing the class of the sample. A first approach is to create a classifier for each class, which separates the points of this class from all the other points. This method is called one-versus-rest (OVR abbreviated) or one-versus-all (OVA abbreviated). It consists in using a binary classifier (with real values) by category. The kth classifier is intended to distinguish the index category k from all the others. To assign an example, it is thus presented to C classifiers, and the decision is obtained according to the principle winner-takes-all: the label chosen is that associated with the classifier who returned the highest value. It is commonly quoted as older works evoking the use of this strategy with SVM  (see also ). In , the authors support the thesis that this approach, as sim-ple as it is, when implemented with correctly parameterized SVM, obtains performances that are not significantly lower than those of the other methods. It should be emphasized, however, that it involves learning to allocate to very unbalanced categories, which often raises practical diﬃculties.
Characterization of kernels
In this section, we characterize the kernel function using another way, which also allows us to build new kernels.
Theorem 2.1.  : X X ! R is finitely positive semi-definite (FPSD) if and only
there exists a Hilbert space H with feature map : X ! H such that
(x; y) = h (x); (y)iH :
Theorem 2.2. (Moore-Aronszajn Theorem)
If is symmetric and positive definite kernel on a set X . Then there is a unique Hilbert space of functions on X for which is a reproducing kernel.
Theorem 2.3. (Mercer Theorem) Let is a continuous kernel function that takes two variables x and y and map them to a real value such that (x; y) = (y; x). A kernel is non-negative definite if and only if:
Z Z f(x) (x; y)f(y)dxdy > 0:
In association with a kernel , we can define an integral operator T , which, when applied to a function f(x), generates another function: Z T (f(x)) = (x; y)f(y)dy = [T f](x):
Table of contents :
List of figures
List of tables
1.2 Data classification methods
1.3 Support Vector Machines
1.3.1 General operating principle of the SVMs
188.8.131.52 Basic knowledge: Hyperplane, margin and support vectors
184.108.40.206 Why maximize the margin?
220.127.116.11 Linearity and non-linearity
1.3.2 SVM for binary classification
18.104.22.168 Soft-SVM with kernels
1.3.3 SVM for multi-class classification
22.214.171.124 The combination of SVM bi-classes: a first step towards multi-class SVM
126.96.36.199 One versus All
188.8.131.52 One versus one
184.108.40.206 Weston and Watkins model
1.4 Hyperbolic kernel machine
1.4.1 Capacity measures
1.4.2 Useful properties of the (empirical) Rademacher complexity
1.4.3 Empirical Rademacher complexity of the kernel classes
2 Background: an Overview of Kernel Methods
2.2 Fundamental elements of the theory of kernel functions
2.2.1 Hilbert space
2.2.2 Reproducing Kernel Hilbert Space
2.2.3 Characterization of kernels
2.2.4 Kernel matrix
2.3 Kernel constructions
2.4 Basic kernels
2.4.1 Polynomial kernel
2.4.2 Translation-invariant kernels
3 Hyperbolic Kernel Machine
3.2 Hyperbolic kernel machine
3.2.1 Theoretical framework
3.2.2 Function class and decision boundaries
3.2.3 Function selection
3.3 Statistical properties
3.3.1 Fisher consistency
3.3.2 Guaranteed risk
3.5 Proof of Lemma 3.1
3.6 Technical lemmas
4 Consolidation Kernel
4.2 State of the art on the transformation invariant kernels
4.3 Consolidation kernel
4.4.1 Experimental Setup
4.4.2 XOR Problem
220.127.116.11 Data Set Description