Robust-to-outliers simultaneous inference and noise level estimation using a MOM approach

Get Complete Project Material File(s) Now! »

Copulas and conditional dependence modeling

Distributions with given margins

In the previous section, we studied different ways of estimating adaptively the conditional mean of a variable Y given other variables X1; : : : ; Xp. Nevertheless, more general models are needed if one wants to estimate a multivariate law without any specific separation between explained or explanatory variables. More precisely, in our framework, the statistician is given n i.i.d. replications X1; : : : ; Xn of a random vector X in Rd, and our goal is to estimate the law of X. It can be convenient to model use a parametric model for the law of X, so that the values of the estimated parameters can be easily interpreted in application. Often, it can be challenging to find a good parametric model for the data, especially in a multivariate framework.
We will distinguish two kind of parameters: on the first side, marginal parameters, i.e. parameters that only influence the univariate margins X1; : : : ; Xd ; and on the other side, “pure” dependence parameters. For example, assume that d = 2 and that law of X is bivariate Gaussian with means 1; 2, standard deviation 1; 2 and correlation . Then ( 1; 2; 1; 2) is a vector of marginal parameters and is a pure dependence parameter, meaning that the distributions of X1 and X2 do not change with . On the contrary, the covariance Cov1;2 := 1 2 is a somehow mixed parameter that we would like to avoid, since it contained some information about the margins and about the dependence at the same time.
One fruitful idea for inference is a generalization of this idea. This is the fundamental concept of copula modeling, giving general and flexible estimation techniques such as inference from margins (see Algorithm 1), where the marginals are estimated first and the dependence is modelled in a second step.
The concept of copula itself comes from Sklar [125] in 1959 (see Sklar [126] and Scheizer [121] for historical references). Probabilists were interested in properties of several classes of distributions, in particular distributions with given margins. For example, if we have d continuous distributions F1; : : : ; Fd on R, how can we construct d-dimensional distributions F1:d whose margins are F1; : : : ; Fd ?
Theorem 1.1 (Sklar, 1959). Let d 2 be an integer.
Let F1:d be a distribution function on Rd with continuous margins F1; : : : ; Fd, and X F1:d. Then there exists a distribution C on [0; 1]d with uniform margins, named the copula of X1; : : : ; Xd such that the following equation holds
8x = (x1; : : : ; xd) 2 Rd; F1:d(x) = C F1(x1); : : : ; Fd(xd) ;
and C is given by
8u = (u1; : : : ; ud) 2 [0; 1]d; C(u1; : : : ; ud) = F1:d F1 (u1); : : : ; Fd (ud) ;
where Fi is the inverse function of Fi, for i = 1; : : : ; d.
Conversely, if F1; : : : ; Fd are continuous distributions on R, and C is a copula (i.e. a continuous
distribution on [0; 1]d with uniform margins), then F1:d defined by
8x = (x1; : : : ; xd) 2 Rd; F1:d(x) := C F1(x1); : : : ; Fd(xd) ;
is a joint distribution on Rd whose margins are respectively distributed as F1; : : : ; Fd and whose copula is C.
Moreover, C is the joint distribution of U = (U1; : : : ; Ud) where Ui := Fi(Xi) for i = 1; : : : ; d when
X = (X1; : : : ; Xd) F1:d.
Therefore, we have a bijection between the joint cdf F1:d and the decomposition (F1; : : : ; Fd; C). This allows to separate on the one hand the marginal distributions F1; : : : Fd, that can be estimated separately, with possibly different models, and on the other hand the copula C, which summarizes the whole depen-dence between the components of X. The copula C can be understood as a standardization of the law of X where all the information about the margins has been removed. Indeed, for every j 2 f1; : : : ; dg, Uj := Fj(Xj) follows a uniform distribution on [0; 1] and this is true as long as the marginal distributions Fi are continuous

Conditional copulas and the simplifying assumption

We study now a related framework, where the statistician observe i.i.d. replications of a vector X = (XI ; XJ ) where XI 2 Rp is a vector of conditioned variables and XI 2 Rd p is a vector of conditioning variables, in the sense that we want to model the law of XI given XJ . In the previous sections, we have separated marginal and dependence parameters of a given distribution F . Similarly, we would like to separate “conditional marginal parameters”, i.e. parameters linked to the conditional marginals cdfs FjjJ FXjjJ of Xj given J for j = 1; : : : ; p;
“conditional dependence parameters”, i.e. parameters linked to the conditional copula CIjJ of given XJ .
This conditional copula exists by the conditional version of Sklar’s Theorem (Theorem 1.1), by which we can decompose the conditional multivariate cdf FIjJ as follows 8xI 2 Rp; 8xJ ; FIjJ (xI jXJ = xJ ) = CIjJ F1jJ (x1jXJ = xJ ); : : : ; FpjJ (xpjXJ = xJ ) XJ = xJ : (1.6)
These conditional copulas have been introduced by Patton [111, 112] and in a more general context by Fermanian and Wegkamp [52]. For example, in a time series context, we may have a sequence of random vectors (Xt)t indexed by the time t 2 Z. To predict one observation using the previous one in a Markov-chain like model, we would need to estimate the conditional law of Xt+1 given Xt. This is close to the previous framework, with the formal choice XI := Xt+1 and XJ := Xt. In this case, the conditional copula of Xt+1 given Xt can be understood as the prediction of the dependence between the different components of Xt+1 given Xt.
Conditional copulas also naturally appear in the so-called vine framework, see [74, 77, 10]. Let us detail this idea. Using Bayes’ theorem, one can show that any (unconditional) copula of dimension d can be decomposed using d(d 1)=2 bivariate conditional copula. By this term, we mean conditional copulas where the conditioned vector XI is of dimension 2, while the conditioning vector XJ has a dimension between 0 and d 2. This decomposition, also called pair-copula construction [1], allows a a very flexible way of constructing any multivariate copula.
Getting back in the classical framework, the conditional copula of an explained random vector XI given an explanatory vector XI can be used to explain how the dependence among the components of X can change with the values of the conditioning variable. Indeed, in the general case, the conditional copula of XI given XJ = xJ does depend on the conditioning variable xJ . Sometimes, to make the inference easier, people assume that the conditional copula is constant with respect to the conditioning variable xJ . This is called the “Simplifying Assumption” for a given conditional copula model and may or may not be satisfied in practice, i.e. with a given data-generating process. A visual representation of the simplifying assumption when d = 2 is given on Figure 1.3. The general case, where the conditional copula does depend on the conditioning variable Z, is illustrated on Figure 1.4.

Kendall’s tau: a measure of dependence, and its conditional version

Since the copula is a cdf, it lives in an infinite-dimensional space and can be hard to represent, store (in the memory of a computer) or interpret in applications. Therefore, it may be useful to model the dependence by a number, rather than by a function. One can invoke the usual (Pearson’s) coefficient of correlation, but it is not always defined. More precisely, the correlation coefficient does not exists when one of the marginal distribution does not belongs to L2, for example, if it is a Cauchy distribution. Moreover, the correlation coefficient is no invariant with respect to increasing transformations of the margins, such as a logarithmic transformation.
Several margin-free measures of dependence have been proposed, one of the best-known among them is Kendall’s tau [80]. For a bivariate random vector X = (X1; X2), it is defined as
1;2 := IP (X2;1 X1;1)(X2;2 X1;2) > 0 IP (X2;1 X1;1)(X2;2 X1;2) < 0 ; (1.8)
where X1 := (X1;1; X1;2) and X2 := (X2;1; X2;2) are two i.i.d. replications of X. This can be interpreted as the probability of observing a concordant pair (“the two variables move in the same direction”) minus the probability of observing a discordant pair (“the two variables move in opposite direction”). Note that Kendall’s tau is always defined for any distribution on R2 without any moment assumption and lies in the interval [ 1; 1]. It is invariant by increasing transformations of the marginal distribution and therefore only depends on the copula of X. The link between the Kendall’s tau of a given distribution and its copula is in fact explicit and given by 1;2 = 4 [0;1]2 C(u; v)dC(u; v) 1. Further properties of Kendall’s tau and related dependence measures are detailed in [106].
Kendall’s tau can be estimated easily by the empirical proportion of concordant pairs minus the empirical proportion of discordant pairs, giving an estimator ^. For most bivariate families of copulas C = fC ; 2 Rg, there exists a bijection between the Kendall’s tau and the parameter , such that = ( ). Then a natural estimator for is given by the technique called “Inversion of Kendall’s tau”, ^ ( 1) (^), where ( 1) denotes the inverse of
In a bivariate framework, the inference procedure for the law of X can be therefore divided in four independent steps:
Estimation of the first marginal distribution F1 ;
Estimation of the second marginal distribution F2 ;

Table of contents :

1 Introduction
1.1 Estimation of the conditional mean: linear regression and related methods
1.1.1 Least-squares estimators and penalization
1.1.2 Adaptivity to using two square-root estimators
1.1.3 Robustness to outliers using the Median-of-Means approach
1.2 Copulas and conditional dependence modeling
1.2.1 Distributions with given margins
1.2.2 Inference of copulas models
1.2.3 Conditional copulas and the simplifying assumption
1.2.4 Kendall’s tau: a measure of dependence, and its conditional version
1.2.5 Estimation of the conditional Kendall’s tau
1.3 Other topics in inference
1.3.1 Estimation of a regular conditional functional by conditional U-statistic regression
1.3.2 About confidence intervals for ratios of means
I Linear regression
2 Improved bounds for Square-root Lasso and Square-root Slope
2.1 Introduction
2.2 The framework
2.3 Optimal rates for the Square-Root Lasso
2.4 Adaptation to sparsity by a Lepski-type procedure
2.5 Algorithms for computing the Square-root Slope
2.6 Optimal rates for the Square-Root Slope
2.7 Proofs
2.7.1 Preliminary lemmas
2.7.2 Proof of Theorem 2.1
2.7.3 Proofs of the adaptive procedure
2.7.3.1 Proof of Theorem 2.3
2.7.3.2 Proof of Lemma 2.4
2.7.3.3 Proof of Lemma 2.5
2.7.4 Proof of Theorem 2.8
3 Robust-to-outliers simultaneous inference and noise level estimation using a MOM approach
3.1 Introduction
3.2 Results in the high-dimensional linear regression framework
3.3 A general framework
3.4 Technical lemmas
3.5 Control of the supremum of TK;(g; ; f; ) on each F()
3.5.1 Preliminaries
3.5.2 Proof of the first assertion of Lemma 3.12
3.5.3 Proof of the second assertion of Lemma 3.12
3.6 Proof of Lemma 3.11
3.6.1 Bound on F()
3.6.2 Bound on F()
3.6.3 Bound on F()
3.6.3.1 Case jjf 􀀀 fjjL2 r(K)
3.6.3.2 Case jjf 􀀀 fjjL2 > r(K)
3.6.4 Bound on F()
3.6.5 Bound on F()
3.6.6 Bound on F()
3.6.6.1 Case jjf 􀀀 fjjL2 r(K)
3.6.6.2 Case jjf 􀀀 fjjL2 P > r(K)
3.6.7 Bound on F()
3.6.8 Bound on F()
3.6.9 Bound on F()
3.6.9.1 Case jjf 􀀀 fjjL2 P r(K)
3.6.9.2 Case jjf 􀀀 fjjL2 P > r(K)
3.7 Proofs of main results
3.7.1 Proof of Theorem 3.4
3.7.2 Proof of Theorem 3.1
II Conditional copula estimation
4 About tests of the “simplifying” assumption for conditional copulas 79
4.1 Introduction
4.2 Tests of the simplifying assumption
4.2.1 “Brute-force” tests of the simplifying assumption
4.2.2 Tests based on the independence property
4.2.3 Parametric tests of the simplifying assumption
4.2.4 Bootstrap techniques for tests of H0
4.2.4.1 Some resampling schemes
4.2.4.2 Bootstrapped test statistics
4.3 Tests with “boxes”
4.3.1 The link with the simplifying assumption
4.3.2 Non-parametric tests with “boxes”
4.3.3 Parametric test statistics with “boxes”
4.3.4 Bootstrap techniques for tests with boxes
4.4 Numerical applications
4.5 Conclusion
4.6 Notation
4.7 Proof of Theorem 4.14
4.7.1 Preliminaires
4.7.2 Proof of Theorem 4.14
4.7.3 Proof of Proposition 4.16
5 About kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior
5.1 Introduction
5.2 Definition of several kernel-based estimators of 1;2jz
5.3 Theoretical results
5.3.1 Finite distance bounds
5.3.2 Asymptotic behavior
5.4 Simulation study
5.5 Proofs
5.5.1 Proof of Proposition 5.1
5.5.2 Proof of Proposition 5.2
5.5.3 Proof of Proposition 5.3
5.5.4 Proof of Proposition 5.4
5.5.5 Proof of Proposition 5.6
5.5.6 Proof of Proposition 5.7
5.5.7 Proof of Proposition 5.8
5.5.8 Proof of Proposition 5.9
5.5.9 Proof of Lemma 5.17
6 About Kendall’s regression
6.1 Introduction
6.2 Finite-distance bounds on ^
6.3 Asymptotic behavior of ^
6.3.1 Asymptotic properties of ^ when n ! 1 and for fixed n0
6.3.2 Oracle property and a related adaptive procedure
6.3.3 Asymptotic properties of ^ when n and n0 jointly tend to +1
6.4 Simulations
6.4.1 Numerical complexity
6.4.2 Choice of tuning parameters and estimation of the components of
6.4.3 Comparison between parametric and nonparametric estimators of the conditional Kendall’s tau
6.4.4 Comparison with the tests of the simplifying assumption
6.4.5 Dimension 2 and choice of
6.5 Real data application
6.6 Proofs of finite-distance results for ^
6.6.1 Technical lemmas
6.6.2 Proof of Theorem 6.5
6.7 Proofs of asymptotic results for ^ n;n0
6.7.1 Proof of Lemma 6.7
6.7.2 Proof of Theorem 6.10
6.7.3 Proof of Proposition 6.11
6.7.4 Proof of Theorem 6.12
6.7.5 Proof of Theorem 6.13
6.8 Proof of Theorem 6.14
6.8.1 Proof of Lemma 6.18 : convergence of T1
6.8.2 Proof of the asymptotic normality of T4
6.8.3 Convergence of T6 to 0
6.8.4 Convergence of T7 to 0
6.8.5 Convergence of T3 to 0
6.9 Technical results concerning the first-step estimator
6.10 Estimation results for a particular sample
7 A classification point-of-view on conditional Kendall’s tau 187
7.1 Introduction
7.2 Regression-type approach
7.3 Classification algorithms and conditional Kendall’s tau
7.3.1 The case of probit and logit classifiers
7.3.2 Decision trees and random forests
7.3.3 Nearest neighbors
7.3.4 Neural networks
7.3.5 Lack of independence and its influence on the proposed algorithms
7.4 Simulation study
7.4.1 Choice of the functions f ig; i = 1; : : : ; p0
7.4.2 Comparing different copulas families
7.4.3 Comparing different conditional margins
7.4.4 Comparing different forms for the conditional Kendall’s tau
7.4.5 Higher dimensional settings
7.4.6 Choice of the number of neurons in the one-dimensional reference setting
7.4.7 Influence of the sample size n
7.4.8 Influence of the lack of independence
7.5 Applications to financial data
7.5.1 Conditional dependence with respect to the Eurostoxx’s volatility proxy
7.5.2 Conditional dependence with respect to the variations I of the Eurostoxx’s implied volatility index
7.6 Conclusion
7.7 Some basic definitions about copulas
7.8 Proof of Theorem 7.3
7.9 Proof of Theorem 7.4
III Other topics in inference
8 Estimation of a regular conditional functional by conditional U-statistic regression
8.1 Introduction
8.2 Theoretical properties of the nonparametric estimator ^()
8.2.1 Non-asymptotic bounds for Nk
8.2.2 Non-asymptotic bounds in probability for ^
8.2.3 Asymptotic results for ^
8.3 Theoretical properties of the estimator ^
8.3.1 Non-asymptotic bounds on ^
8.3.2 Asymptotic properties of ^ when n ! 1 and for fixed n0
8.3.3 Asymptotic properties of ^ jointly in (n; n0)
8.4 Applications and examples
8.5 Notations
8.6 Finite distance proofs for ^ and ^
8.6.1 Proof of Lemma 8.3
8.6.2 Proof of Proposition 8.5
8.6.3 Proof of Theorem 8.8
8.7 Proof of Theorem 8.14
8.7.1 Proof of Lemma 8.20
8.7.2 Proof of the asymptotic normality of T4
8.7.3 Convergence of T6 to 0
8.7.4 Convergence of T7 to 0
8.7.5 Convergence of T3 to 0
9 Confidence intervals for ratios of means: limitations of the delta method and honest confidence intervals
9.1 Introduction
9.2 Our framework
9.3 Limitations of the delta method
9.3.1 Asymptotic approximation takes time to hold
9.3.2 Asymptotic results may not hold for sequences of models
9.3.3 Extension of the delta method for ratios of expectations in the sequence-of-models framework
9.4 Construction of nonasymptotic confidence intervals
9.4.1 An easy case: the support of Y is well-separated from 0
9.4.2 Nonasymptotic confidence intervals with no assumption on the support of PY
9.5 Nonasymptotic CIs: impossibility results and practical guidelines
9.5.1 An upper bound on testable confidence levels
9.5.2 Practical methods and plug-in estimators
9.5.3 A lower bound on the length of nonasymptotic confidence intervals
9.6 Numerical applications
9.6.1 Simulations
9.6.2 Application to real data
9.7 Conclusion
9.8 Proofs of the results in Sections 9.3, 9.4 and 9.5
9.8.1 Proof of Theorem 9.1
9.8.2 Proof of Theorem 9.2
9.8.3 Proof of Theorem 9.3
9.8.4 Proof of Theorem 9.6
9.8.5 Proof of Theorem 9.5
9.9 Adapted results for Hoeffding framework
9.9.1 Concentration inequality in the easy case
9.9.2 Concentration inequality in the general case
9.9.3 An upper bound on testable confidence levels
9.9.4 Proof of Theorems 9.11 and 9.12
9.9.5 Proof of Theorem 9.13
9.10 Additional simulations
9.10.1 Gaussian distributions
9.10.2 Rule of thumb using n
9.10.3 Student distributions
9.10.4 Exponential distributions
9.10.5 Pareto distributions
9.10.6 Bernoulli distributions
9.10.7 Poisson distributions
Remerciements
Bibliography