Get Complete Project Material File(s) Now! »

## Previous works and issues

On a statistic point of view, various models have been proposed for the extreme value distribution G through the modelization of the function x 7! μ[0, x]c, i.e. G(x) = exp[−μ[0, x]c] (see e.g. Coles and Tawn (1991)). Two main strategies arise to infer such models. Methods based on the componentwise maximum are used to estimate G while threshold exceedence methods are used to estimate μ. Componentwise maximum methods face a major issue to estimate the likelihood when the dimensionality increase (d > 10). Indeed, one has to compute: et al. (2016)). Some simplifications have been proposed (e.g. Wadsworth (2015), Stephenson and Tawn (2005)), but the computation remains infeasible when d > 10. Either for componentwise maximum methods or for threshold exceedence, most of these approaches only consider the case of total dependence between the variables. In other words, the case where the support of μ is infinite on any hyperplane of [0,1]d \{0}. Also, in high dimension, non-parametric estimations of these measures are not spared by the general issue of the curse of dimentionality which is all the more burdened by the small proportion of points (i.e. the extreme points) that should be used for inference. The modelization of the angular measure, which is in theory equivalent to the modelization of the exponent measure, lends itself well to methods aiming at reducing the dimension (see Section 2.3).

### SPARSE ANGULAR MEASURE

In high dimension (d 100), it is reasonable to assume that an extreme phenomenon described by a random variable V = (V1, . . . , Vd) is not due to all its features simultaneously, i.e. Vj > t for all j 2 {1, . . . , d} where t > 0 is a large threshold. More precisely, we make the assumption that the extreme nature of the phenomenon is only due to some particular subgroups of coordinates. Let us assume that kV k is large, then there exists a subgroup {1, . . . , d} such that kV k kVk, where || d and V = (Vj)j2. The aim is to bring to the fore the multimodal nature of the extremal behavior of V and thus the plurality of subgroups that identify the asymptotically dependent coordinates. It is also reasonable to assume that the number of such subgroups is small compared to the total number of subsets of {1, . . . , d}, that is 2d−1. More formally, let be the angular measure associated with V , then the dependence structure of V is characterized through the distribution of the mass of over the positive orthant of the unit sphere. In the case of a sparse dependence structure, the mass of is only distributed over some particular subspaces of the boundary of Sd, which correspond to asymptotic dependent subgroups of features. Indeed, let us assume that at least two coordinates Vi and Vj are asymptotically independent, which write:

#### Coefficient of tail dependence

The purpose of this section is to build a serie of test statistics based on the hypothesis μ() > 0. Using the criterion > 0 raises an issue. Indeed, as soon as μ() = 0 the limiting distribution of the statistics p k(b − ) is degenerate. Thus, we have no control of the asymptotic levels of the tests. Let us consider the tail dependence coefficient 2 (0, 1] introduced in the bivariate case in (Ledford and Tawn (1996)) and extended to general dimensions d 3 in (De Haan and Zhou (2011)) et (Eastoe and Tawn (2012)). The fundamental assumption stipulates the existence of 2 (0, 1] and a slowly varying function L such that: Under the assumption that both the limit limt!1 tP[V 2 t] = μ() exists and (2.36) is verified, μ() > 0 implies = 1. On the contrary, suppose (2.36) and lim inft!1 L(t) > 0 then = 1 implies μ() > 0. In other words, the null hypothesis μ() > 0 corresponds to the hypothesis = 1 under mild conditions on L. Hence, if = 1 the limit p k(b − ) is non-degenerate and it is possible to control the asymptotic levels of the associated test. A new criterion C() based on the estimator of the coefficient of tail dependence is used in the algorithm CLEF. This series of test statistics is developed in Chapter 4 for different non-parametric estimators of , namely a multivariate extention of the Peng estimator (Peng (1999)) along with the Hill estimator (Draisma et al. (2001), Draisma et al. (2004)). A non-degenerate version of the test H0 : = 0 is also developed by adding a threshold min > 0, to get the new hypothesis H0 : min.

**Table of contents :**

**1 Résumé **

1.1 Introduction

1.2 Théorie des valeurs extrêmes

1.2.1 Théorie des valeurs extrêmes univariées

1.2.2 Théorie des valeurs extrêmes multivariées

1.2.3 Travaux antérieurs et problèmes

1.3 Mesure angulaire parcimonieuse

1.3.1 Problème de l’estimation de M

1.4 Estimation du support de la mesure angulaire

1.4.1 Inférence

1.4.2 Algorithme CLustering Extreme Feature

1.4.3 Coefficient de dépendance de queue

1.5 Modèle paramétrique pour la mesure angulaire

1.6 Contributions

1.7 Conclusion

**2 Introduction **

2.1 Introduction

2.2 Extreme value theory

2.2.1 Univariate extreme value theory

2.2.2 Multivariate extreme value theory

2.2.3 Previous works and issues

2.3 Sparse angular measure

2.3.1 Issue on the estimation of M

2.4 Estimation of the support of the angular measure

2.4.1 Inference

2.4.2 CLustering Extreme Feature Algorithm

2.4.3 Coefficient of tail dependence

2.5 Parametric modeling of the angular measure

2.6 Contributions

2.7 Open problems

**3 Clustering Extreme Features **

3.1 Introduction

3.2 Problem statement and multivariate EVT viewpoint

3.2.1 Formal statement of the problem

3.2.2 Connections with multivariate EVT

3.3 Dimension reduction for multivariate extremes

3.3.1 Existing work

3.3.2 Gathering together ‘close-by’ cones, incremental strategy

3.4 Empirical criterion and implementation

3.4.1 Conditional criterion for extremal dependence

3.4.2 Algorithm

3.5 Results

3.5.1 Stream-flow data

3.5.2 Simulation experiments

3.5.3 Influence of the threshold choice

3.6 Conclusion

3.7 Appendix: Proof of Lemma 3.1

**4 Asymptotic Tests on the Coefficient of Tail Dependence **

4.1 Introduction

4.2 Regular variation and tail dependence coefficients

4.3 Empirical tail dependence functions and processes

4.4 Estimating the conditional tail dependence coefficient

4.5 Coefficient of tail dependence: Peng’s estimator

4.6 Coefficient of tail dependence: Hill estimator

4.7 Simulation study

4.8 Conclusion

4.9 Proofs

4.10 CLEF algorithm and variants

**5 Clustering of Extreme points and Visualization **

5.1 Introduction

5.2 Background and preliminaries

5.2.1 Multivariate extreme value theory

5.2.2 Support estimation

5.3 A mixture model for multivariate extreme values

5.3.1 Angular measure

5.3.2 A mixture model

5.3.3 An EM algorithm for model inference

5.4 Graph-based visualization tools

5.5 Illustrative experiments

5.5.1 Experiments on simulated data

5.5.2 Flights clustering and visualization

5.6 Conclusion