Optimality and Performance measures in multipartite ranking

Get Complete Project Material File(s) Now! »

Uniform approximation of generalized U-statistics through sampling

As will be seen below, the statistics considered in the previous section are (generalized) U-statistics, which can be uniformly approximated by Monte-Carlo versions whose computation cost is drastically reduced. This will be next proved to be an essential tool for investigating the performance of decision rules learnt through optimization of such empirical quantities.

Maximization of the VUS

We now discuss the consequences of Theorem 3.2.1 through the problem of maximizing the VUS introduced in section 3.1 (notice that, in this case, we have MH = 1). Beyond theoretical guarantees, the performance of algorithms based on incomplete versions of the empirical counterpart of the functional of interest is illustrated by numerical results, supporting the efficiency of the sampling approach promoted in this chapter in the machine-learning context.

Pairwise aggregation: from bipartite to K-partite ranking

In the present section, we propose a practical strategy for building scoring functions which approximate optimal scoring functions for multipartite ranking based on a set of labeled observations. The principle of this strategy is the aggregation of scoring functions obtained for the pairwise subproblems. We emphasize the fact that the situation is very different from multiclass classification where aggregation boils down to linear combination, or majority voting, over binary classifiers (for « one against one » and « one versus all « , we refer to [Allwein et al., 2001], [Hastie & Tibshirani, 1998], [Venkatesan & Amit, 1999], [Debnath et al., 2004], [Dietterich & Bakiri, 1995], [Beygelzimer et al., 2005b], [Beygelzimer et al., 2005a] and the references therein for instance). We propose here, in the K-partite ranking setup, a metric-based barycentric approach to build the aggregate scoring function from the collection of scoring functions estimated for the bipartite subproblems.

From AUC consistency to VUS consistency

In this section, we introduce auxiliary results which contribute to the proof of the main theorem (details are provided in the last section of this chapter). Key arguments rely on the relationship between the solutions of the bipartite ranking subproblems and those of the K-partite problem. In particular, a sequence of scoring functions that is simultaneously AUC-consistent for the bipartite ranking problems related to the two pairs of distributions (φ1, φ2) and (φ2, φ3) is VUS-consistent. Indeed, we have the following corollary.

Multipartite Ranking Algorithms

In contrast to the bipartite situation (see [Clémençon & Vayatis, 2010], [Clémençon & Vayatis, 2009b]), no algorithm optimizing the ROC surface directly and producing a scoring function bsn for which d∞(bsn, s∗) → 0 in probability has been documented in the literature. Beyond theoretical results guaranteeing the validity of empirical maximization of the VUS criterion (see [Rajaram & Agarwal, 2005]), most methods proposed rely on the optimization of an alternative (pairwise) criterion ([Freund et al., 2003] and [Pahikkala et al., 2007]), or on the decomposition of the original multipartite problem into bipartite subproblems combined with a final aggregation/consensus stage ([Hüllermeier et al., 2008] and [Clémençon et al., 2013b]) or still on plug-in approaches based on ordinal regression ([Waegeman et al., 2008c]). In addition, it is far from straightforward to extend the TreeRank algorithm recalled above because, when K ≥ 3, as a straightforward computation based on Eq. 1.2.9 may show, the splitting step cannot be interpreted as a learning problem which can be solved by means of off-the-shelf techniques, unlike the bipartite case. Indeed, taking s(x) = I{x ∈ C} for some measurable set C ⊂ X, we have..

Table of contents :

Introduction (English)
I Performance measures for multipartite ranking
1 Optimality and Performance measures in multipartite ranking
1.1 Optimal elements in multipartite ranking
1.1.1 Probabilistic setup and notations
1.1.2 Optimal scoring functions
1.1.3 Existence and characterization of optimal scoring functions .
1.1.4 Examples and counterexamples
1.1.5 Connection with ordinal regression
1.2 Performance measures for multipartite ranking
1.2.1 Reminder on ROC curves
1.2.2 ROC surface
1.2.3 ROC-optimality and optimal scoring functions
1.2.4 Reminder on the AUC criterion
1.2.5 Volume under the ROC surface (VUS)
1.2.6 VUS-optimality
1.2.7 Approaches to K-partite ranking
1.3 Conclusion
1.4 Proofs
1.5 Annex – Fast computation of the VUS in case of ties
2 Confidence regions for the ROC surface via smoothed bootstrap
2.1 Estimation and distances
2.1.1 Metrics on the ROC space
2.1.2 Statistical estimation of the ROC surface
2.2 Assessment
2.2.1 Asymptotic normality
2.2.2 Smooth bootstrap
2.3 Numerical experiments
2.3.1 Simulated data
2.3.2 Real data
2.4 Discussion
2.5 Proofs
3 Subsampling the VUS criterion
3.1 Motivation
3.2 Uniform approximation of generalized U-statistics through sampling
3.2.1 Definitions and key properties
3.2.2 Uniform approximation of U-statistic
3.3 Maximization of the VUS
3.3.1 Sampling the risk in K-partite ranking
3.3.2 Illustrations
3.4 Conclusion
3.5 Proofs
II Algorithms for K-partite ranking
4 Aggregation of scoring functions
4.1 Pairwise aggregation: from bipartite to K-partite ranking
4.1.1 Decomposition step
4.1.2 Median scoring functions and optimal aggregation
4.1.3 A practical aggregation procedure
4.2 Consistency of pairwise aggregation
4.2.1 Definition of VUS-consistency and main result
4.2.2 From AUC consistency to VUS consistency
4.3 How to solve the supports issue
4.3.1 The supports issue
4.3.2 Consistency even with supports issue
4.4 Conclusion
4.5 Proofs
5 TreeRank Tournament
5.1 Background and Preliminaries
5.1.1 Bipartite Ranking and the TreeRank Algorithm
5.1.2 Multipartite Ranking Algorithms
5.1.3 Further Notations and Preliminaries
5.2 Adaptive Piecewise Planar Approximation of ROC∗
5.2.1 An Implicit Tree-Structured Recursive Interpolation Scheme .
5.3 Analysis of TreeRank Tournament
5.3.1 The TreeRank Tournament algorithm
5.3.2 A consistency result
5.3.3 Learning rate bounds
5.4 Conclusion
5.5 Proofs
6 Numerical experiments
6.1 Data description
6.1.1 Simulated data
6.1.2 Real dataset
6.2 Criteria
6.3 TreeRank methods
6.3.1 Our algorithms in action
6.3.2 Discussion
6.4 Comparison with competitors
6.4.1 Description of the competitors
6.4.2 Results and discussion
6.5 Annex – Numerical results
III Minimaxity and Ranking
7 Minimax rates in bipartite ranking
7.1 Theoretical background
7.1.1 Probabilistic setup and first notations
7.1.2 Bipartite ranking
7.1.3 Additional assumptions
7.2 Comparison inequalities
7.3 Oracle inequalities for the aggregation procedure
7.3.1 Aggregation via exponential weights
7.3.2 An oracle inequality
7.4 Minimax rates
7.4.1 The « mild » case
7.4.2 The « strong » case
7.5 A lower bound
7.6 Conclusion
7.7 Proofs
7.8 Annex – Discussion on the lower bounds
Bibliography