Unfortunately, one kind of descriptor is not usually enough to achieve satisfactory results in general pattern recognition problems. Thus we must combine the informa-tion from diﬀerent sources to improve the performance of single descriptors. In this sense, there are several starting points to face this problem. We can find strategies consisting of merging features from diﬀerent descriptors into a single descriptor and then, training high performance classifiers like neural networks [Lippmann, 1988], boosting-based classifiers or Support Vector Machine [Burges, 1998]. However, for general purpose problems in which the number of classes begins to be high and the shapes to be recognized can be counted by thousands, these expert classifiers begin to fail. Moreover, when the classes to recognize may change depending on the problem handled or according to user requirements, neural networks, boosting algorithms and support vector machine become rigid methods because they should be trained again, which is usually time-expensive. Therefore, we have to find strategies that easily per-mit us to adapt to the user needs, avoiding to train again the system each time the problem conditions change. One possibility, the one we have followed in this work, is to use diﬀerent classifiers, one for each descriptor, and combine them in such a way that the global performance increases —see figure 1.4 for an example of this strategy compared to the combination of features, explained before. This strategy is known in the community under some diﬀerent names: aggregation operators, combination rules or classifier fusion, depending on the context and the characteristics of classi-fiers. We like the name of classifier fusion and we will use it in most of the times, despite sometimes we will also use indiﬀerently the terms of aggregation operator and combination rule to refer to the same concept.
Objectives and contribution of the present dissertation
Then, this dissertation is also about classifier fusion. Classifier fusion methods started to be developed in the 90s. A raised question was whether it is reasonable to combine the output of several classifiers to improve global classification rates and, if plausible, which is the best way to combine these outputs. The first question has af-firmatively answered through the research done in machine learning and experimental results obtained during these last years [Kittler, 2000, Kittler et al., 1998, Schapire, 1990, Tax et al., 2000]. We find that fusion methods improve individual classifiers performance. However, the question concerning which of fusion method is the best one among all of them remains open. Besides, experimental evidence leads to think that there is not a “best” fusion method, that it will depend on the application. In this sense, for example, a discussion on when it is better to average or to multiply the classifiers is given in [Tax et al., 2000] under strict probabilistic conditions but the final behavior will depend on the application. Works like those by Stejic et al.  propose a genetic algorithm that learns which aggregation operator should be used every time. In general, classifier fusion methods have been grouped in two classes according to the architecture used to combine them: serial and parallel. Of course, there is a third architecture, an hybrid architecture, which combines serial and parallel classifier configurations [Jain et al., 2000, Zouri, 2004].
In Serial combination, classifiers are organized into ordered stages. Each stage is composed of a single classifier that takes into account the output of the prece-dent classifier to confirm, or reject, its response. Each classifier filters some candidates to be the model reducing in such way recognition errors at each stage.
In the Parallel architecture, on the contrary, we evaluate all classifiers at the same time and, afterwards, we combine classifier outputs to return a single candidate (or a ranked list of candidates). In these systems, the choice of the aggregation operator is critical.
Hybrid architecture combines both parallel and serial configurations to take advan-tage of the goodness of each one. The diﬃculty relies on the design of such an architecture.
We have chosen a parallel architecture to fusion classifiers because we are interested in the aggregation operators used to combine them. In particular, the chapter 4 has been devoted to find linear combination rules minimizing the misclassification rates.
Objectives and contribution of the present dis-sertation
The original proposal of the research project was related to the search of graphic en-tities in technical documents using content-based information. Thus, we had to find suitable descriptors for this kind of shapes, which could be rotated and scaled in any part of the document. Besides, the number of symbols could be big so that descrip-tors should find an agreement between the size of the descriptor and its description capabilities to facilitate later indexing tasks.
The global problem required the help of a wide range of methods from the fields of shape recognition, segmentation, vectorization, classifiers and indexing. We had to focus our work on some of these parts and we decided to intensify our research in the shape recognition field to find out suitable descriptors for graphic symbols. However, the lack of a general theory of shape description required a review and a test of exist-ing techniques in feature extraction methods. In addition, existing descriptors do not seem convenient when the number of symbols in the database to recognize increase dramatically and we have explored new descriptors like multiresolution descriptors based on ridgelets transform. Multiresolution descriptors oﬀer a hierarchical descrip-tion of shapes according to their resolution. It seemed that this kind of descriptors could be useful to define an indexing structure, which facilitates the task of recog-nition when the number of symbols increase. Furthermore, the capacity of ridgelets transform to detect linear singularities made this transform more suitable than other multiresolution transforms. Nevertheless, this descriptor took with him some intrinsic problems, namely the size of descriptor and the representation of symbols at diﬀerent levels of resolution. On the one hand, we have studied diﬀerent strategies to reduce the size of descriptors. On the other hand, it was not evident how to combine all resolutions of ridgelets descriptors and therefore, we have decided to explore classifier fusion. We have studied linear rules as a way to combine all resolutions and after-wards, we have compared them with other popular combination rules like max and median in a framework for the general problem of combination of classifiers.
As a result, this dissertation treats about two diﬀerent issues in shape recognition: shape representation using multiresolution descriptors and the fusion of classifiers to combine the information of multiple descriptors to improve single performance and increase recognition rates.
Goals of the work
In this section, we will pinpoint the main goals achieved in this dissertation. These contributions turn around two axes: shape descriptors and classifier fusion.
Shape descriptors When we have reviewed the bibliography, we have realized that there is not an unified terminology. Due to the complexity of the shape recogni-tion process, the division into three stages done above, namely shape description, shape comparison and shape classification is not standard and depends on each author. Thereby, there are several terms to denote similar concepts but be-ing slightly diﬀerent. Some feature extraction methods consider the similarity measure as a part of the method, meanwhile others not. Hence, our review of the state of art starts by the proposal of a set of definitions consistent with
Objectives and contribution of the present dissertation
the decomposition of the shape recognition process in these three stages. More specifically:
• We have formalized and unified the concept of feature vectors, signatures and other terms used to refer to the results of feature extraction methods under the name of descriptor.
• We have generalized the notion of primitive, specific for structural descrip-tors, proposing a definition that can be applied to any descriptor.
• We have introduced the notion of primary feature extraction method in order to decompose a descriptor in more elemental entities. Such decom-position should not be seen like the decomposition of natural numbers in prime numbers, in the sense that a descriptor can be decomposed in a unique sequence of primary feature extraction methods. On the contrary, it must be seen as a flexible manner to decompose a descriptor identifying common elements among them for their classification.
After, we have reviewed existing methods and organized this part of the chap-ter according to these definitions. In particular we have defined a taxonomy based on the characteristics of primitives, on the properties of feature extraction methods and on the characteristics of descriptors.
Ridgelets descriptors are particular multiresolution descriptors. We have consid-ered ridgelets transform because this transform is suitable to detect linear sin-gularities in multivariate functions. Thereby, in the context of Graphic Recog-nition, this kind of singularities correspond to lines. Concerning other graphic entities, like arcs, theoretical results assert that ridgelets transform is not worse than other multiresolution transforms like wavelets [Cand`es, 1998]. The study of ridgelets transform has involved:
• The study of Radon transform. Ridgelets transform requires the use of Radon transform. Thus we have implemented Radon transform seeking to reduce the time complexity of this transform. A modified version of the Fast Slant Stack algorithm [Averbuch et al., 2001] that corrects some geometrical distortions when the shape is rotated has been proposed.
• Definition of a ridgelets-based descriptor. We have constructed a shape descriptor based on the ridgelets coeﬃcients. In this sense, we have mod-ified the ridgelets representation to make descriptors robust to noise and vectorial distortion. Besides, we have normalized it to make it invariant to shape shifts and scale.
• Multiresolution descriptors have diﬀerent number of scales depending on the shape resolution. In order to compare shapes at diﬀerent resolutions we have introduced the notion of Decomposition Level, DL, grouping in the same DL all scales that appear when we increase the resolution of the shape.
• Ridgelets descriptors are not invariant to symbol rotation. So, we have proposed a similarity measure achieving invariance for each scale.
• We have explored diﬀerent techniques to reduce the size of ridgelets descrip-tors. Some proposals consist of applying R-Signature descriptor [Tabbone et al., 2006, 2001] to each ridgelet scale and defining Local Norm descrip-tors [Costa and Cesar Jr., 2001] based on ridgelets transform Classifier Fusion: Although the original problem was raised in the context of com-bination of multiresolution descriptors, we have attacked it in the general frame-work the combination of classifiers. Hence, the work in this area has involved:
• Raising the problem of combining ridgelets descriptors as a general prob-lem of classifiers fusion. In this context, we have proposed a probabilistic framework where we have modeled two-class classifiers as random variables.
• Focusing on linear combination rules, we found the optimal weight for each classifier that minimizes the classification error under some constraints about classifier distribution and dependence/independence of classifiers. For independent classifiers we have found an explicit formula for the cases where classifier distribution can be approximated by Normal and Dirac distributions. For dependent classifiers, we have raised an optimization problem to be solved for classifiers whose linear combination follows a a normal distribution.
• Extending the optimal solution for binary classifiers to multi-class classi-fiers through the operator arg max.
Combination of ridgelets descriptors: We have finished our dissertation apply-ing the scheme of classifier fusion to the ridgelets descriptor. We have con-structed two parallel classifiers. The first classifier uses the weights obtained under the assumption of Independence of classifiers whereas in the second one the weights have been obtained solving the optimization problem corresponding to the Dependence assumption.
Framework of the thesis
We had to make some practical decisions in order to pass from the abstract shape recognition problem, raised at the beginning of this chapter, to a practical shape recognition problem. In this sense, some of the decisions taken were related to the following issues: the application domain, the theoretical framework and the experi-mental framework.
Application Domain: We have mainly worked with graphic symbols, specially when we have studied shape descriptors, because the origin of this work was to look for suitable shape descriptors for Graphic symbols. However, in the context of classifier fusion we have extended our domain to datasets of handwritten numerals.
Theoretical framewok: Our mathematical background has influenced the theoret-ical approach. Multiresolution descriptors are based on wavelets theory which is an active interest of research in functional analysis community. In this con-text, the shape image is taken as a bivariate function and all the mathematical machinery of this field has been applied. Furthermore, a probabilistic approach has been adopted in the classifier fusion approach, and as for multiresolution analysis we have applied all the probabilistic terminology we have needed.
Experimental framework: Two slightly diﬀerent experimental frameworks have been developed according to the proposals for ridgelets descriptors and classifier fusion. Ridgelets descriptors are applied to segmented images without any need of symbol vectorization. For this reason, we have compared ridgelets descriptors with other descriptors that do not need to be vectorized and have to be applied on segmented shapes, too. The main measure to evaluate their performance is the recognition rate or, alternatively, the classification error. Concerning classifier fusion we have continued working with segmented, and non vectorized, shapes. Furthermore, recognition rate has been the measure used to evaluate classifier performance. However, we have compared our proposal of classifier fusion with other classifier fusion approaches. For that reason, several classifiers have been trained. Finally, we have complemented the set of experiments using synthetic data to simulate classifiers satisfying the theoretical assumptions done in the theoretical framework and other classifiers not verifying them.
This dissertation has been organized similarly to this chapter, which more or less follows the temporal evolution of the work with two exceptions: The review of shape descriptors, that has been done during all the time but have been summarized in the next chapter, and the experimental evaluation that has been grouped at the end of this work, before the conclusions chapter. More specifically, in each chapter we will find the following;
• In chapter 2, we review the diﬀerent shape descriptors proposed in the liter-ature. First, we introduce some definitions related to descriptors and feature extraction methods, permitting us to fix the vocabulary that we will use along this work. Later, we introduce the most used shape descriptors in the graphics recognition domain, grouping them according to the properties of the descrip-tors. We have centered our attention in descriptor properties being the case that we have introduced several times the same descriptor depending on the property discussed at each moment. This discussion has been oriented to introduce, in the next chapter, a ridgelets descriptor.
• Chapter 3 is devoted to ridgelets descriptors. In the first part we introduce the mathematical framework that we need to explain the ridgelets descriptor, i.e. Radon transform and Multiresolution Analysis theory. Then, we continue with the implementation of ridgelets transform and the definition of a descriptor based on this transform. We finalize this chapter by enumerating some of the problems of ridgelets descriptors, namely the size of the ridgelets descriptor and the combination of descriptors from all resolutions. In this chapter, we also propose some solutions for reducing the size of ridgelets descriptors, through the definition of Local Norm descriptors.
• In chapter 4, we face the classifier fusion problem. We start with a short review of classifiers and classifier fusion techniques and then we develop a probabilis-tic framework to propose optimal solutions for linear combination of two-class classifiers.
Table of contents :
1.1 Overview of Document Analysis
1.1.1 Shape Recognition
1.1.2 Classifier Fusion
1.2 Objectives and contribution of the present dissertation
1.2.1 Goals of the work
1.2.2 Framework of the thesis
2 Shape Descriptors
2.3 Taxonomy of Descriptors
2.3.1 Primitives of Descriptors
2.3.2 Multiresolution methods
2.3.3 Structural descriptors
3 Ridgelets descriptors
3.2 Theoretical Framework
3.2.1 Radon Transform
3.2.2 Multiresolution Analysis
3.3 Ridgelets Transform
3.3.1 Computation of Ridgelets coefficients
3.4 Image representation
3.4.1 Definition of a shape model
3.4.2 Definition of a similarity measure
3.4.3 Definition of a combination rule
3.5 Local Norm descriptors based on ridgelets transform
4 Classifier Fusion
4.2 Classifier Fusion approaches
4.2.1 Bayesian approach
4.2.2 Additive models: logistic regression approach
4.3 The problem of classifier fusion: definitions
4.4 Optimal Linear Combination Rules: IN and DN
4.4.1 IN method
4.4.2 DN method
5 Experimental Evaluation
5.3 Ridgelets descriptors
5.3.1 Robustness to resolution
5.3.2 Invariance to similarity transforms
5.3.3 Robustness to degradation and vectorial distortion
5.4 Combining Classifiers
5.4.1 Using synthetic data
5.4.2 Using Shape Databases
5.5 Combining ridgelets descriptors
5.5.1 Invariance to similarity transforms
5.5.2 Robustness to degradation and vectorial distortion
6.1 Summary of the Contributions
6.2 Discussion and Conclusions
6.2.1 Ridgelets descriptors
6.2.2 Classifier Fusion
6.3 Open issues
6.4 Final Conclusion