Credal classification rule for uncertain data using prototype of each class

Get Complete Project Material File(s) Now! »

Basics of belief function theory

The belief functions have been introduced in 1976 by Shafer in his mathematical theory of evidence, known also as belief function theory or Dempster-Shafer theory (DST) [3–7] because Shafer uses Dempster’s fusion rule for combining belief basic assignments. We consider a finite discrete set = {w1,w2, . . . ,wc}. of c > 1 mutually exclusive and exhaustive hypotheses, which is called the frame of discernment (FoD) of the problem under consideration. The power-set of denoted by 2 contains all the subsets of . The singleton elements in 2 represent the specific classes, and the set of elements (i.e. disjunction of elements) can be considered as meta-class. For example, if = {w1,w2,w3}, then 2 = {;,w1,w2,w3,w1 [ w2,w1 [ w3,w2 [ w3, }. The union wi [ wj = {wi,wj} is interpreted as the proposition « the truth value of unknown solution of the problem under concern is either in wi or in wj « . So that represents the full ignorance (uncertainty).
Glenn Shafer [3] considers the subsets as propositions in the case we are concerned with the true value of some quantity ! taking its possible values in . Then the propositions Pw(A) of interest are those of the form1: Pw(A) , The true value of w is in a subset A of . Any proposition Pw(A) 1We use the symbol , to mean equals by definition; the right-hand side of the equation is the definition of the is thus in one-to-one correspondence with the subset A of . Such correspondence is very useful since it translates the logical notions of conjunction ^, disjunction _, implication ) and negation ¬ into the set-theoretic notions of intersection \, union [, inclusion and complementation c(.). Indeed, if Pw(A) and Pw(B) are two propositions corresponding to subsets A and B of , then the conjunction Pw(A) ^ Pw(B) corresponds to the intersection A \ B and the disjunction Pw(A) _ Pw(B) corresponds to the union A[B. A is a subset of B if and only if Pw(A) ) Pw(B) and A is the set-theoretic complement of B with respect to (written A = cw(B)) if and only if Pw(A) = ¬Pw(B). In other words, the following equivalences are then used between the operations on the subsets and on the propositions: (intersectionconjunction), (union disjunction), (inclusion implication) and (complementation negation). A basic belief assignment (BBA) is a function m(.) from 2 to [0, 1] satisfying X A22 m(A) = 1 (2.1) The subsets A of such that m(A) > 0 are called the focal elements of m(.), and the set of all its focal elements is called the core of m(.). If A is a singleton element corresponding to specific class, the quantity m(A) can be interpreted as the exact belief committed to the class A. m(A [ B) reflects the imprecision (non-specificity or ambiguity) degree between the class A and B for the classification of the object. A Credal partition [23, 24] for a data clustering over the frame is defined as n-tuple M = (m1, . . . ,mn), where mi is the basic belief assignment of the sample xi 2 X, i = 1, . . . , n associated with the different elements of the power-set 2 . So the The belief function Bel(.) and the plausibility function Pl(.) [3], are usually interpreted as lower and upper probabilities of the hypothesis. They are mathematically defined for any A 2 2 from a given BBA m(.) by Bel(A) = X B22 |BA m(B) (2.2) Pl(A) = X B22 |A\B6.

SUPERVISED CLASSIFICATION OF DATA

For the supervised classification, the classifiers broadly belong to two families: model-based classifiers and case-based classifiers [13]. Bayes theorem is usually applied in the model based classifiers to calculate the posterior probability estimates according to the class-conditional densities and prior probabilities. However, the complete statistical knowledge of the conditional density functions of each class is hard to obtain in many cases, and this precludes the application of model-based classification procedure. When the model-based classifiers become unavailable, one can choose the case-based classifier computing the class probabilities using the correctly labeled training data set without considering the density estimation.
There exist many well known case-based classifiers, typically the support vector machine (SVM) [59, 60], the artificial neuron network (ANN) [61], decision trees [62] and the K-nearest neighbor (K-NN) classifier [63], etc. Support vector machine allows to construct a hyperplane or set of hyperplanes in a high-dimensional space using the training data set for classification, regression, or other tasks. A good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class (so-called functional margin), because the larger the margin generally leads to the lower error rate of the classifier. Artificial neuron network (ANN) is defined by a set of input neurons which may be activated by the input data. After being weighted and transformed by a function determined by the user, the activations of these neurons are then passed on to other neurons. This process is repeated until an output neuron is activated. Decision trees are used as a predictive model mapping observations about an item to conclusions about target value, and they consist of two main types: classification and regression tree (CART) as an umbrella term. In the tree structure, each internal (non-leaf) node denotes a test on an attribute, each branch represents the outcome, and each leaf node represents a class label. A tree can be learned by splitting the source set into subsets based on an attribute value test, and the process is repeated on each derived subset in a recursive manner until the splitting cannot add value to the predictions any more. K-nearest neighbor (K-NN) rule is a simple but efficient non-parametric procedure, and it remains an interesting topic of research. In K-NN, an unclassified sample is classified into the class which the majority of its K nearest neighbors (KNNs)3 in the training set belong to.

Evidential classification

These traditional classifiers generally work with the probability measure, like ANN, and the imprecise (ignorance) information is not considered in the classification process. The belief function
theory has been already applied for the data classification [12–17, 19–23, 38, 64–70], data clustering [23,24,71,72], and imprecise decision-making support [18,30,57,73–75] to model the uncertainty and imprecision.
Some data classifiers have already been developed based on belief functions in the past. For instance, Smets [76] and Appriou [77] have proposed the model-based classifier based on the Generalized Bayes Theorem (GBT) [76] which is an extension of Bayes theorem in Smets transferable belief model (TBM) [5–7]. There are some other case-based evidential classifiers based on neural network [38], K-nearest neighbors [12], decision trees [78], and SVM [64]. Particularly, the evidential version of K-nearest neighbors method (EK-NN) has been proposed in [12] based on DST, for working only with the specific classes and the extra ignorant class defined by the union of all the specific classes. In EK-NN, K BBA’s can be determined according to the distance between the object and the selected K nearest neighbors, and each BBA has two focal elements: singleton class (i.e. the label of the corresponding neighbor) and ignorance class (i.e. the frame of discernment).
The combination of the K BBA’s by DS rule is used for the classification of the object. A fuzzy version of EK-NN, denoted FEK-NN, has been also developed in [79]. An ensemble technique for the combination of evidential K-NN classifier based on DST has been proposed in [80] to improve the accuracy. A neural network classifier has also been developed in [38] under the belief functions
3In this thesis, K-NN denotes the K-nearest neighbor classifier, whereas KNNs represents the K nearest neighbors.
framework that allows one extra ignorant class as possible output of this classifier, and it can reduce the computation burden with respect to EK-NN but it requires more complicate training process. The relationship between the case-based and model-based approaches working with belief functions have also been studied, and it shows that both methods actually proceed from the same underlying GBT principle, and that they essentially differ by the nature of the assumed available information. This is helpful for us to choose the most appropriate method for the particular application depending on the nature of the information.

Brief recall and comments on EK-NN

Let us consider a group of objects X = {x1, . . . , xn} to classify over the frame of the classes = {w1, · · · ,wh}. In EK-NN [12], the imperfect information is modeled by the ignorant focal element w1 [ . . . [ wi [ . . . [ wh denoted also by , and the BBA of the object xi associated to its close neighbor xj labeled by ws 2 is defined as: mxi j (ws) = e− sd ij (2.12) mxi j ( ) = 1 − e− sd ij (2.13). As recommended in [12], the default value of the discounting factor can be taken to 0.95, and the parameter s must be chosen inversely proportional to the mean distance between two training data belonging to class ws. The parameter usually takes a small value because it has in fact a very little influence on the performance of EK-NN, one takes = 1 as its default value. Generally, dij corresponds to the Euclidean distance between the object xi and the training data xj . K BBA’s corresponding to the K nearest neighbors of the object are constructed according to Eqs. (2.12)-(2.13). From Eq. (2.12), one can see that the bigger distance dij will yield smaller masses of belief on the corresponding class ws. It means that if the object xi is far from the neighbor labeled by class ws, this neighbor can provide little support to xi belonging to the corresponding class ws. From Eq. (2.12) and (2.13), one sees that only the two focal elements ws and are involved in a given BBA mxi j (.), i = 1, . . . , n. The classification result of each object xi is obtained by the fusion of the K BBA’s using DS rule given in Eq. (2.4). Because of the very particular structure of the BBA’s, DS rule produces as focal elements of the fusion only a specific class ws and the total ignorance class
. Therefore, no partial ignorance (meta-classes), say wi [· · ·[wj {wi, · · · ,wj}, can become a focal element in EK-NN method. It is worth to note that DS rule for the fusion of the K bba’s with such particular structure is not very effective for the outlier detection. Indeed, for any outlier very far from its KNNs having same class label, the most mass of belief will be committed to the specific class after the fusion process using DS rule (when K is big enough). This behavior is abnormal since the ignorance class should normally take a larger mass of belief when the object corresponds to a true outlier.
Because of this DS behavior, this object will be incorrectly considered as belonging to a specific class rather than to the outlier class. This behavior is not satisfactory in some real applications, like in target tracking in cluttered environments. This behavior is clearly illustrated in the following simple example.

Table of contents :

Acknowledgement
Abstract
Acronyms
I French Abstract
Résumé étendu
F-1.1Classification par K plus proches voisins crédibiliste (BCKN)
F-1.1.1 Affectation du jeu de masse (BBA)
F-1.1.2Fusion des assignements de masse
F-1.1.3 Applications
F-1.2Règle de classification crédibiliste (CCR)
F-1.2.1 Détermination des centres de classe
F-1.2.2 Assignation des masses
F-1.2.3 Evaluation du CCR sur des données simulées de grande dimension
F-1.3Classification cédibiliste par prototype (PCC) pour les données incomplètes
F-1.3.1 Classification de données incomplètes avec c estimations
F-1.3.2Fusion globale des c classificateurs affaiblis
F-1.3.3 Application de la PCC
F-1.4 K-moyennes floues et évidentielles (CCM)
F-1.4.1Fonction de coût du CCM
F-1.4.2 Evaluation des performances de l’algorithme CCM
II Contributions
1 Introduction
2 Literature overview
2.1 Introduction
2.2 Brief introduction of belief function theory
2.2.1 Basics of belief function theory
2.2.2 Several alternative combination rules
2.2.3 A brief review of DSmT
2.2.4 Decision making support
2.3 Supervised classification of data
2.3.1 Evidential classification
2.3.2 Brief recall and comments on EK-NN
2.4 Data clustering
2.4.1 Credal partition using belief functions
2.4.2 Brief review of Evidential C-Means (ECM)
2.5 Classification of incomplete data with missing values
2.6 Conclusion
3 Credal classification of uncertain data using close neighbors
3.1 Introduction
3.2 Belief C × K neighbors classifier
3.2.1 Principle of BCKN
3.2.2 The determination of basic belief assignments
3.2.3 The fusion of the basic belief assignments
3.2.4 Guidelines for choosing the threshold parameter t
3.2.5 Expressive power of BCKN
3.3 Experiments
3.3.1 Experiment 3.1 (with artificial data sets)
3.3.2 Experiment 3.2 (with 4-class data set)
3.3.3 Experiment 3.3 (with real data sets)
3.4 Conclusion
4 Credal classification rule for uncertain data using prototype of each class
4.1 Introduction
4.2 Credal classification rule (CCR)
4.2.1 Determination of the centers of classes
4.2.2 Construction of BBA’s
4.3 Evaluation of CCR on artificial and real data sets
4.3.1 Experiment 4.1 (with artificial data sets)
4.3.2 Experiment 4.2 (with artificial data sets)
4.3.3 Experiment 4.3 (with large scale artificial data sets)
4.3.4 Experiment 4.4 (with real data sets)
4.4 Conclusions
5 Credal classification of incomplete patterns
5.1 Introduction
5.2 Prototype-based Credal classification method
5.2.1 Classification of incomplete patterns with c estimations
5.2.2 Global fusion of the c discounted classification results
5.3 Experiments
5.3.1 Experiment 5.1 (with 2D 3-class data set)
5.3.2 Experiment 5.2 (with 4-class data set)
5.3.3 Experiment 5.3 (with 4D 3-class data sets)
5.3.4 Experiment 5.4 (with real data sets)
5.4 Conclusion
6 Credal c-means clustering method
6.1 Introduction
6.2 Credal c-Means (CCM) approach
6.2.1 The objective function of CCM
6.2.2 Minimization of the objective function JCCM
6.3 Experiments
6.3.1 Experiment 6.1 (with 3-class artificial data set)
6.3.2 Experiment 6.2 (with 4-class simulated data set)
6.3.3 Experiment 6.3 (with real remote sensing data)
6.3.4 Experiment 6.4 (with real data sets)
6.4 Conclusion
7 Conclusion and perspectives
7.1 Conclusion
7.1.1 Belief c × K neighbors classifier (BCKN)
7.1.2 Credal classification rule (CCR)
7.1.3 Credal classification of incomplete data with missing values
7.1.4 Credal c-means (CCM) clustering method
7.2 Perspectives
7.2.1 Credal classification of sequential data with few training samples
7.2.2 Classification of incomplete pattern using imprecise probability
7.2.3 Credal c-means clustering with some constraints
7.2.4 Unified evaluation criteria for performance of credal classifier
Bibliography