Get Complete Project Material File(s) Now! »

## Analysis of Full Rankings

We now turn to the specific case of full rankings, which will be at the core of this thesis. A full ranking a1 a2 : : : an is usually described as the permutation on JnK that maps an item to its rank, i.e. such that (ai) = i; 8 i 2 JnK. Item i is thus preferred over item j (denoted by i j) according to if and only if i is ranked lower than j: (i) < (j). A permutation can be seen as a bijection on the set JnK onto itself: JnK ! JnK i 7! (i)

For each i 2 JnK, (i) represents the rank of the i-th element, whereas 1(i) represents the i-th ranked element. We denote by Sn the set of all permutations over n items. Endowed with the composition operation 0(i) = (0(i)) for all ; 0 2 Sn, Sn is a group, called the symmetric group and we denote e its identity element which maps each item j to position j. Statistical analysis of full rankings thus relies on this group, and the variability of observations is represented by a discrete probability distribution P on the set Sn: Sn ! [0; 1] 7! P(). Though empirical estimation of P may appear as a simple problem at first glance, it is actually a great statistical challenge since the number of possible rankings (i.e. Sn’s cardinality) explodes as n! with the number of instances to be ranked. Moreover, applying techniques from multivariate analysis is arduous for two reasons. First, for a given random permutation 2 Sn, the random variables ((1); : : : ; (n)) are highly dependent: each of them takes its values in JnK and their values must be different. Then, the sum of two random permutations vectors = ((1); : : : ; (n)) and 0 = (0(1); : : : ;0(n)) does not correspond to another permutation vector. Hence, to represent probability distributions over permutations, several approaches exist but we can divide them between parametric versus « non parametric » ones. The term « nonparametric » is not correct since Sn is finite. What we call a “parametric” approach consists in fitting a predefined generative model on the data, analyzing the data through that model and inferring knowledge with respect to that model. In contrast, what we call “non-parametric” approach consists in choosing a structure on the symmetric group, analyzing the data with respect to that structure, and inferring knowledge through a “regularity” assumption.

### Parametric Approaches

Most-known statistical models can be categorized into five classes:

distance-based models: the probability of a ranking decreases as the distance from a central ranking increases. Example: the Mallows model (see Mallows (1957)).

order statistics or random utility models: the ranking reflects the ordering of latent scores given to each object. Example: the Thurstone model (see Thurstone (1927)).

multistage ranking models: the ranking is modeled as a sequential process of selecting the next most preferred object. Example: the Plackett model (see Plackett (1975)).

paired-comparison models: the probability expression of a ranking considers every pair of items i, j such that i is preferred to j (Pfg / Q (i;j)j(i)<(j) pij ). Example: the Bradley-Terry model (see Bradley & Terry (1952)). These models are now described at length, since through this thesis they will often be used for baselines in the experiments.

Mallows model. This model can be seen as analoguous to a Gaussian distribution for permutations. The Mallows -model is parametrized by a modal or reference ranking and a dispersion parameter 2 (0; 1]. Let be a ranking, then the Mallows model specifies.

#### Other Ranking Aggregation Methods

Many other methods have been proposed in the literature, that we do not list exhaustively here. However, many of them will be used as baselines in the experiments of Chapter 4. MLE and Bayesian approaches. A common approach is to consider that the observed rankings, for examples pairwise comparisons, are generated according to a given model centered around a true underlying order. The outcomes of the pairwise comparisons are then noisy versions of the true relative orders under some model, and one can perform MLE (Maximum Likelihood Estimation) to recover the underlying central ranking. Several models and settings have been considered. This idea actually originally arised in the work of Condorcet (see Condorcet (1785)) under a particular noise model (equivalent to Mallows, see Lu & Boutilier (2011)): a voter ranks correctly two candidates with probability p > 1=2. Numerous contributions thus consider a Mallows model (see Fligner & Verducci (1990); Meila et al. (2007); Braverman & Mossel (2008)), a -noise model (Pfi jg = 1=2 + for all i < j, see Braverman & Mossel (2008)), Bradley-Terry model (see BTL-ML algorithm in Rajkumar & Agarwal (2014)), or more original ones such as the coset-permutation distance based stagewise (CPS) model (see Qin et al. (2010)). Some come with statistical guarantees, for instance BTL-ML outputs a Kemeny consensus with high probability under the assumption that the pairwise probabilities verifies the low-noise (bis) assumption (see Section 2.2.2). An interesting work is also the one of Conitzer & Sandholm (2012) which states for which voting rules (e.g. scoring rules), there exists (or not) a noise model such that the voting rule is an MLE estimator under this model. Another approach, especially in the domain of chess gaming, use Bayesian statistics to estimate a ranking, assuming some priors on the items preferences (see Glickman (1995); Herbrich et al. (2006); Coulom (2008)). The obvious limitations of these approaches is that it depends on the unknown underlying noise model.

**Table of contents :**

List of Publications

List of Figures

List of Tables

List of Symbols

**1 Introduction **

1.1 Background on Ranking Data

1.2 Ranking Aggregation

1.2.1 Definition and Context

1.2.2 A General Method to Bound the Distance to Kemeny Consensus

1.2.3 A Statistical Framework for Ranking Aggregation

1.3 Beyond Ranking Aggregation: Dimensionality Reduction and Ranking Regression

1.3.1 Dimensionality Reduction for Ranking Data: a Mass Transportation Approach

1.3.2 Ranking Median Regression: Learning to Order through Local Consensus

1.3.3 A Structured Prediction Approach for Label Ranking

1.4 Conclusion

1.5 Outline of the Thesis

**2 Background on Ranking Data **

2.1 Introduction to Ranking Data

2.1.1 Definitions and Notations

2.1.2 Ranking Problems

2.1.3 Applications

2.2 Analysis of Full Rankings

2.2.1 Parametric Approaches

2.2.2 Non-parametric Approaches

2.2.3 Distances on Rankings

2.3 Other Frameworks

**I Ranking Aggregation **

**3 The Ranking Aggregation Problem **

3.1 Ranking Aggregation

3.1.1 Definition

3.1.2 Voting Rules Axioms

3.2 Methods

3.2.1 Kemeny’s Consensus

3.2.2 Scoring Methods

3.2.3 Spectral Methods

3.2.4 Other Ranking Aggregation Methods

**4 A General Method to Bound the Distance to Kemeny Consensus **

4.1 Introduction

4.2 Controlling the Distance to a Kemeny Consensus

4.3 Geometric Analysis of Kemeny Aggregation

4.4 Main Result

4.5 Geometric Interpretation and Proof of Theorem 10.1

4.5.1 Extended Cost Function

4.5.2 Interpretation of the Condition in Theorem 10.1

4.5.3 Embedding of a Ball

4.5.4 Proof of Theorem 10.1

4.6 Numerical Experiments

4.6.1 Tightness of the Bound

4.6.2 Applicability of the Method

4.7 Conclusion and Discussion

**5 A Statistical Framework for Ranking Aggregation **

5.1 Introduction

5.2 Background

5.2.1 Consensus Ranking

5.2.2 Statistical Framework

5.2.3 Connection to Voting Rules

5.3 Optimality

5.4 Empirical Consensus

5.4.1 Universal Rates

5.4.2 Fast Rates in Low Noise

5.4.3 Computational Issues

5.5 Conclusion

5.6 Proofs

**II Beyond Ranking Aggregation: Dimensionality Reduction and Ranking Regression**

**6 Dimensionality Reduction and (Bucket) Ranking: A Mass Transportation Approach **

6.1 Introduction

6.2 Preliminaries

6.2.1 Background on Bucket Orders

6.2.2 A Mass Transportation Approach to Dimensionality Reduction on Sn .

6.2.3 Optimal Couplings and Minimal Distortion

6.2.4 Related Work

6.3 Empirical Distortion Minimization – Rate Bounds and Model Selection

6.4 Numerical Experiments on Real-world Datasets

6.5 Conclusion

6.6 Appendix

6.7 Proofs

**7 Ranking Median Regression: Learning to Order through Local Consensus **

7.1 Introduction

7.2 Preliminaries

7.2.1 Best Strictly Stochastically Transitive Approximation

7.2.2 Predictive Ranking and Statistical Conditional Models

7.3 Ranking Median Regression

7.4 Local Consensus Methods for Ranking Median Regression

7.4.1 Piecewise Constant Predictive Ranking Rules and Local Consensus

7.4.2 Nearest-Neighbor Rules for Ranking Median Regression

7.4.3 Recursive Partitioning – The CRIT algorithm

7.5 Numerical Experiments

7.6 Conclusion and Perspectives

7.7 Appendix – On Aggregation in Ranking Median Regression

7.8 Proofs

**8 A Structured Prediction Approach for Label Ranking **

8.1 Introduction

8.2 Preliminaries

8.2.1 Mathematical Background and Notations

8.2.2 Related Work

8.3 Structured Prediction for Label Ranking

8.3.1 Learning Problem

8.3.2 Losses for Ranking

8.4 Output Embeddings for Rankings

8.4.1 The Kemeny Embedding

8.4.2 The Hamming Embedding

8.4.3 Lehmer Code

8.4.4 Extension to Partial and Incomplete Rankings

8.5 Computational and Theoretical Analysis

8.5.1 Theoretical Guarantees

8.5.2 Algorithmic Complexity

8.6 Numerical Experiments

8.7 Conclusion

8.8 Proofs and Additional Experiments

8.8.1 Proof of Theorem 1

8.8.2 Lehmer Embedding for Partial Rankings

8.8.3 Additional Experimental Results

**9 Conclusion, Limitations & Perspectives **

**10 Résumé en français **

10.1 Préliminaires sur les Données de Classements

10.2 L’agrégation de Classements

10.2.1 Définition et Contexte

10.2.2 Une Méthode Générale pour Borner la Distance au Consensus de Kemeny161

10.2.3 Un Cadre Statistique pour l’Agrégation de Classements

10.3 Au-delà de l’Agrégation de Classements : la Réduction de Dimension et la Régression de Classements

10.3.1 Réduction de Dimension pour les Données de Classements : une Approche de Transport de Masse

10.3.2 Régression Médiane de Classements: Apprendre à Classer à travers des Consensus Locaux

10.3.3 Une Approche de Prédiction Structurée pour la Régression de Classements

10.4 Conclusion

10.5 Plan de la Thèse

**Bibliography**