Get Complete Project Material File(s) Now! »

## CONCENTRATION INEQUALITIES FOR -EXPONENTIAL RANDOM FOURIER FEATURES (RFFS) ON DERIVATIVES

Note. This chapter corresponds to an article written with E. Gobet and Z. Szabo. It has been published in Journal of Machine Learning Research (JMLR), 21:1{37, 2020, under the title \Orlicz Random Fourier Features ».

Abstract. Kernel techniques are among the most widely-applied and in uential tools in ma-chine learning with applications at virtually all areas of the eld. To combine this expressive power with computational e ciency numerous randomized schemes have been proposed in the literature, among which probably random Fourier features (RFF) are the simplest and most popular. While RFFs were originally designed for the approximation of kernel values, recently they have been adapted to kernel derivatives, and hence to the solution of large-scale tasks involving function derivatives. Unfortunately, the understanding of the RFF scheme for the approximation of higher-order kernel derivatives is quite limited due to the challenging polyno-mial growing nature of the underlying function class in the empirical process. To tackle this di culty, we establish a nite-sample deviation bound for a general class of polynomial-growth functions under -exponential Orlicz condition on the distribution of the sample. Instantiating this result for RFFs, our nite-sample uniform guarantee implies a.s. convergence with tight rate for arbitrary kernel with -exponential Orlicz spectrum and any order of derivative.

### Introduction

Kernel machines [TC04, SC08, PR16] form one of the most fundamental tools in machine learning and statistics with a wide range of successful applications. The impressive modelling power and exibility of kernel techniques in capturing complex nonlinear relations originates from the richness of the underlying Hk function class called reproducing kernel Hilbert space (RKHS, [Aro50]) associated to a k : X X ! R kernel. Kernels extend the classical notion of inner product on X = Rd by assuming the existence of a : X ! H feature map to a Hilbert space H such that k(x; x0) = h (x); (x0)iH for all x; x0 2 X. This simple equality (also called the kernel trick) forms the basis of kernel techniques and enables one to compute inner products implicitly without direct access to the feature of the points.

As a practical example, one can consider for instance the previously mentioned in nite-dimensional exponential (IE) family tting problem where the estimation boils to solving a linear equation system with coe cients made of third-order kernel derivatives (jp + qj = 3). The IE family is de ned by a kernel for which a common choice is the Gaussian; this implies a Gaussian spectrum . Unfortunately, in this case the bounded support condition is not met. Similarly, the Bernstein condition (1.1.5) only holds for jp + qj 2 with the Gaussian kernel, as it can be veri ed [SS19, Remark ’Higher-order derivatives’ in Section 3] by making use of the analytical expressions for the absolute moments of the Gaussian spectrum. These limitations (summarized in Table 1.1) of the popular random Fourier features technique motivate our work and the study of widely-applied kernels with unbounded spectral support for the RFF approximation of high-order kernel derivatives. A consequence of our new estimates in Theorem 1.1 is that the a:s: rates previously obtained under stringent conditions (on p; q or ) are now available for any p; q and any spectral measure with -exponential moments (as de ned in (1.1.6), > 0). Because Bernstein condition implies exponential moments, our result includes the one given by [SS19].

Kernels with -exponential Orlicz spectrum include the popular Gaussian, the inverse multi-quadric, or the sech kernel; for further examples see Table 1.2 and Remark 2(ii). We establish the consistency and prove nite-sample uniform guarantees of the resulting Orlicz RFF scheme for the approximation of kernel derivatives at any order, as it is brie y illustrated in the last line of Table 1.1.

To allow this level of generality, we prove a new nite-sample deviation bound for the em-pirical process related to a general class of functions f with polynomial growth of the sample Xm. The distribution of the latter is assumed to have nite -exponential Orlicz norm and consequently, the random variables f(Xm) belong to a -exponential Orlicz space with index smaller than 1. For deriving such deviation bounds, we have been inspired by the work of [Ada08] which elegantly combines the [KR05] inequality for truncated variables, the Ho man-Jorgensen inequality to deal with sum of residual of truncated variables, and a [Tal89] inequality in -exponential Orlicz norms for sum of centered random variables. However, our work signif-icantly di ers from that of [Ada08]. First, our aims are di erent: [Ada08] focuses on getting large deviation bounds while we are looking for all-scale deviation bounds, which leads to a di erent analysis (in the application of Klein-Rio inequalities for instance). Second, we are concerned by getting upper bounds with quite explicit control. In particular, this requires a careful treatment of Orlicz-type estimates since the function (x) = ex 1 de ning the Orlicz space is not convex for < 1 (see Figure 1.1), as opposed to the usual case; see the results in Section 1.4. We also derive sharp estimates from the Dudley entropy integral bound (Theorem 1.4), which enables us to get a tight dependency w.r.t. the diameter of the parameter space. Furthermore, we clarify the use of the Talagrand inequality (Theorem 1.2); in [Ada08, Theorem 5] it is seemingly invoked for supremum over functions while it is related to sum over centered random variables. With this novel nite-sample deviation bound, the analysis of Orlicz RFFs readily follows, using optimized inequalities.

The paper is structured as follows. Our problem is formulated in Section 1.2. The main result on the approximation quality of kernel derivatives with random Fourier features is presented in Section 1.3. Properties of the Orlicz norm are summarized in Section 1.4. Proofs are provided in Section 1.5. The appendix contains additional technical details (Section 1.A), the de nition of special functions (Section 1.B) and external statements used in the proofs (Section 1.C).

#### Main result

In this section we present our main result on the supremum of empirical processes of polynomial growth, and specialize it to the approximation quality of RFFs for kernel derivatives. The proofs are given in Section 1.5.

We investigate the concentration of the supf2F ing assumptions:

Compact parameterization: F = fft : t 2 T g where ft : Rd ! R is parameterized by a compact set T Rd0.

**Remark 2.**

Spectral measure ( ) examples: Our result assumes the -exponential Orlicz property of the spectral measure associated to k. In Table 1.2 we provide various examples for (with the relevant case of unbounded support) satisfying this requirement; their relation is summarized in Figure 1.2. While for the RFF approximation it is not necessary, in many of these examples the corresponding kernel value can also be computed, see Table 1.3.

**Table of contents :**

**Introduction **

0.1 Context of the PhD

0.2 Concentration inequalities for heavy-tailed random variables

0.3 Covariance matrix estimation for minimum variance portfolio

0.4 Uncertainty quantication for portfolio

0.5 Kernel and optimal transport portfolio optimization with target distribution

0.6 Contexte de la these

0.7 Inegalites de concentration pour variables a queues epaisses

0.8 Estimation de matrice de covariance pour portefeuille a variance minimum

0.9 Traitement des incertitudes pour le portefeuille

0.10 Optimisation de portefeuille avec distribution cible par divergence kernel et transport optimal

**I Concentration inequalities for heavy-tailed random variables **

1 -exponential random Fourier features (RFFs)

1.1 Introduction

1.2 Problem formulation

1.3 Main result

1.4 Properties of the Orlicz norm

1.5 Proofs

1.5.1 Proof of Remark 1(v)

1.5.2 Proof that polynomial growth preserves the exponential Orlicz property

1.5.3 Proof of Theorem 1.1

1.5.4 Proof of F FP(n) and nite maximal moments

1.5.5 Control if c cHJ

1.5.6 Bounding the driving terms of Theorem 1.1 for RFF

1.5.7 Proofs of Corollary 1.3.1 and 1.3.2

1.A Additional proofs

1.A.1 (l) , the convexication of

1.A.2 Proof of Remark 2(ii)

1.A.3 Proof of the properties in Section 1.4 about the Orlicz norm

1.B Special functions

1.C External statements

**2 -heavy tailed concentration **

2.1 Introduction

2.2 Motivating examples and main results

2.2.1 Orlicz norm properties

2.2.2 Motivating examples of heavy-tailed distributions and adapted Orlicz norm 71

2.2.3 HT -Orlicz norm: properties and inequalities

2.3 Proofs

2.3.1 Proof of Theorem 2.2.2

2.3.2 Proof of Theorem 2.2.3

2.3.3 Proof of Theorem 2.2.5

2.3.4 Proof of Proposition 2.2.1

2.4 Conclusion

**II Covariance estimation for minimum variance portfolio **

**3 Covariance estimation for minimum variance portfolio **

3.1 Introduction

3.1.1 Literature background

3.1.2 Contribution and outline of this chapter

3.2 Formulation of the problem

3.2.1 Problem setup

3.2.2 Model assumptions

3.2.3 Main results

3.3 Proofs and auxiliary results

3.3.1 Proof of Theorem 3.2.2

3.3.2 Proof of Proposition 3.3.1

3.3.3 Proof of Propositions 3.2.1 and 3.3.2

3.3.4 Proof of Lemma 3.3.3

3.3.5 Auxiliary result

3.4 Model specication

3.4.1 Motivation: GARCH-CCC model

3.4.2 Stationarity, ergodicity and application of results

3.4.3 Additional models

3.5 Numerical experiments

3.5.1 Realistic GARCH values, ting procedure

3.5.2 Simulation procedure

3.5.3 Empirical probability: impact of pmax and d

3.5.4 Tail exponent Hill estimation

3.6 Conclusion, perspectives

3.A GARCH properties

3.A.1 Ergodicity and stationarity

3.A.2 GARCH-CCC ergodicity

3.A.3 Irreducibility and auxiliary results on the GARCH-CCC

3.A.4 GARCH-CCC density

3.B Hill estimator

3.C Additional properties

**III Uncertain quantication for portfolio **

**4 UQSA for portfolio Sharpe ratio and allocation**

4.1 Introduction

4.1.1 UQ techniques

4.1.2 UQSA setting

4.1.3 Explored nancial applications

4.2 UQSA algorithm: principle and main assumptions

4.2.1 Principle and notations

4.2.2 UQSA algorithm

4.2.3 Main assumptions and convergence results

4.2.4 Applications: convergence metrics

4.3 UQSA for the Sharpe ratio

4.3.1 Denition and motivation

4.3.2 SA scheme for the Sharpe ratio

4.3.3 UQSA assumptions verication

4.3.4 Applications: log-normal returns and specic weights

4.3.5 Experiments

4.3.6 Conclusion to UQ for the Sharpe ratio section

4.4 UQSA for the portfolio allocation

4.4.1 Portfolio optimization as SA problem

4.4.2 Allocation model

4.4.3 Application

4.4.4 Conclusion to UQ for portfolio analysis section

4.A Proof of Lemma 4.4.3

4.B UQSA algorithm: a projection approach

**IV Portfolio optimization with kernel and optimal transport (K.O.T.) divergence 171**

**5 K.O.T. portfolio optimization with target distribution **

5.1 Introduction

5.2 Problem formulation and optimization

5.3 Results

5.3.1 Analytical formulas for mean embedding

5.3.2 Concentration of semi-explicit MMD

5.4 Experiments on simulated data

5.4.1 Advantage of semi-explicit MMD

5.4.2 Misspecied setting

5.5 Experiments on nancial time series

5.5.1 Datasets, rebalancing, performance measures, target distributions

5.5.2 Numerical studies

5.6 Conclusion

5.A Proofs

5.A.1 Proofs of auxiliary statements

5.A.2 Proof of concentration results (Theorem 5.2, Theorem 5.3, Theorem 5.4)

5.B Implementation tools

5.B.1 Truncated evaluation of E

5.B.2 Score functions and kernel derivatives

5.B.3 Moments of various distributions

5.C External statements