Get Complete Project Material File(s) Now! »

## Preliminaries

The allocation problem mentioned in the previous section is formalized as a K-armed bandit problem where each arm (stratum) k = 1, . . . ,K is characterized by a distribution νk with mean value μk and variance σ2 k. At each round t ≥ 1, an allocation strategy (or algorithm) A selects

an arm kt and receives a sample drawn from νkt independently of the past samples. Note that

a strategy may be adaptive, i.e., the arm selected at round t may depend on past observed samples. Let {wk}k=1,…,K denote a known set of positive weights which sum to 1. For example

in the setting of stratified sampling for Monte-Carlo, this would be the probability mass in each stratum. The goal is to define a strategy that estimates as precisely as possible μ = PK using a total budget of n samples.

I{ks = k} the number of times arm k has been pulled up to time Xk,s the empirical estimate of the mean μk at time t, where Xk,s denotes the sample received when pulling arm k for the s-th time.

After n rounds, the algorithm A returns the empirical estimate bμk,n of all the arms. Note that in the case of a deterministic strategy, the expected quadratic estimation error of the weighted mean μ as estimated by the weighted average bμn = PK

### MC-UCB and the lower bound

We provide in this Chapter a minimax (problem independent) lower-bound for the pseudo-regret that is in expectation of order (K1/3 n4/3 ) (see Theorem 8). An important achievement is that the

problem independent upper bound on the pseudo-regret of MC-UCB is in expectation of the same order up to a logarithmic factor (see Theorem 10). It is thus impossible to improve this strategy uniformly on every problem, more than by a log factor.

Although we do not have a problem dependent lower bound on the pseudo-regret yet, we believe that the rate ˜O(n−3/2) cannot be improved for general distributions. As explained in the proof in Appendix 5.C, this rate is a direct consequence of the high probability bounds on the estimates of the standard deviations of the arms which are in O(1/√n), and those bounds are tight. Because of the minimax lower-bound that is of order O(n−4/3), it is however clear that there exists no algorithm with a regret of order eO(n−3/2) without any dependence in λ−1 min (or another related problem-dependent quantity).

#### The algorithm

In this section, we introduce our adaptive algorithm for the allocation problem, called Monte Carlo Upper Confidence Bound (MC-UCB). The algorithm computes a high-probability bound on the standard deviation of each arm and samples the arms proportionally to their bounds times the corresponding weights. The MC-UCB algorithm, AMC−UCB, is described in Figure 5.1. It requires three parameters as inputs: c1 and c2 which are related to the shape of the distributions (see Assumption 5.4.2), and δ which defines the confidence level of the bound. The amount of exploration of the algorithm can be adapted by properly tuning these parameters.

**Making MC-UCB anytime**

An interesting question is on whether and how it is possible to make algorithmMC-UCB anytime.

Although we will not provide formal proofs of this result in this Chapter, we believe that setting a δ that evolves with the current time, as δt = t−7/2, is sufficient to make all the regret bounds of this Chapter hold with slightly modified constants. Some ideas on how to prove this result can be found in the paper [Grover, 2009], and also [Auer et al., 2002] for something more specific to UCB algorithms.

**Numerical experiment: Pricing of an Asian option**

We consider the pricing problem of an Asian option introduced in [Glasserman et al., 1999] and later considered in [Etor´e and Jourdain, 2010; Kawai, 2010]. This uses a Black-Scholes model with strike C and maturity T. Let (W(t))0≤t≤1 be a Brownian motion that is discretized at d

**MINIMAX STRATEGY FOR STRATIFIED SAMPLING FOR MONTE CARLO**

equidistant times {i/d}1≤i≤d, which defines the vector W ∈ Rd with components Wi = W(i/d).

The discounted payoff of the Asian option is defined as a function of W, by:where S0, r, and s0 are constants, and the price is defined by the expectation p = EWF(W).

We want to estimate the price p byMonte-Carlo simulations (by sampling onW = (Wi)1≤i≤d). In order to reduce the variance of the estimated price, we can stratify the space of W. Glasserman

et al. [1999] suggest to stratify according to a one dimensional projection of W, i.e., by choosing a projection vector u ∈ Rd and define the strata as the set of W such that u · W lies in intervals of R. They further argue that the best direction for stratification is to choose u = (0, · · · , 0, 1), i.e., to stratify according to the last component Wd of W. Thus we sample Wd and then conditionally sample W1, …,Wd−1 according to a Brownian Bridge as explained in [Kawai, 2010]. Note that this choice of stratification is also intuitive since Wd has the biggest exponent in the payoff (5.9), and thus the highest volatility. Kawai [2010] and Etor´e and Jourdain [2010] also use the same direction of stratification.

**Table of contents :**

**1 R´esum´e en fran¸cais de cette th`ese **

1.1 Th´eorie des bandits

1.1.1 Les bandits : un outil efficace en petite dimension

1.1.2 Upper Confidence Bounds Algorithms for Active Learning in Multi-Armed Bandits

1.1.3 Finite time analysis of stratified sampling for Monte Carlo

1.1.4 Minimax Number of Strata for Online Stratified Sampling given Noisy Samples

1.1.5 Online Stratified Sampling for Monte-Carlo integration of Differentiable functions

1.1.6 Toward optimal stratification for stratified Monte-Carlo integration

1.2 Compressed Sensing

1.2.1 Compressed Sensing : L’´echantillonnage optimal en grande dimension

1.2.2 Sparse Recovery with Brownian Sensing

1.2.3 Bandit Theory meets Compressed Sensing for high dimensional linear bandit

**2 Introduction **

I Bandit Theory

3 The Bandit Setting

3.1 The historical Bandit Setting

3.1.1 The classical bandit setting: cumulative regret

3.1.2 Lower and upper bounds

3.1.3 Direct extensions of the classical bandit problem with cumulative regret

3.2 Adaptive allocation with partial feedback

3.2.1 Adaptive allocation with partial feedback

3.2.2 Active learning

3.2.3 Monte-Carlo integration CONTENTS

**4 Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits **

4.1 Introduction

4.2 Preliminaries

4.3 Allocation Strategy Based on Chernoff-Hoeffding UCB

4.3.1 The CH-AS Algorithm

4.3.2 Regret Bound and Discussion

4.4 Allocation Strategy Based on Bernstein UCB

4.4.1 The B-AS Algorithm

4.4.2 Regret Bound and Discussion

4.4.3 Regret for Gaussian Distributions

4.5 Experimental Results

4.5.1 CH-AS, B-AS, and GAFS-MAX with Gaussian Arms

4.5.2 B-AS with Non-Gaussian Arms

4.6 Conclusions and Open Questions

4.A Regret Bound for the CH-AS Algorithm

4.A.1 Basic Tools

4.A.2 Allocation Performance

4.A.3 Regret Bound

4.A.4 Lower bound for the regret of algorithm CH-AS

4.B Regret Bounds for the Bernstein Algorithm

4.B.1 Basic Tools

4.B.1.1 A High Probability Bound on the Standard Deviation for sub- Gaussian Random Variable

4.B.1.2 Bound on the regret outside of ξ

4.B.1.3 Other Technical Inequalities

4.B.2 Allocation Performance

4.B.3 Regret Bounds

4.C Regret Bound for Gaussian Distributions

**5 Minimax strategy for Stratified Sampling for Monte Carlo **

5.1 Introduction

5.2 Preliminaries

5.3 Minimax lower-bound on the pseudo-regret

5.4 Allocation based on Monte Carlo Upper Confidence Bound

5.4.1 The algorithm

5.4.2 Pseudo-Regret analysis of MC-UCB

5.5 Links between the pseudo-loss and the mean-squared error

5.5.1 A quantity that is almost equal to the pseudo-loss

5.5.2 Bounds on the cross-products CONTENTS

5.5.3 Bounds on the true regret and asymptotic optimality

5.6 Discussion on the results

5.6.1 Problem dependent and independent bounds for the expectation of the pseudo-loss

5.6.2 Finite-time bounds for the true regret, and asymptotic optimality

5.6.3 MC-UCB and the lower bound

5.6.4 The parameters of the algorithm

5.6.5 Making MC-UCB anytime

5.7 Numerical experiment: Pricing of an Asian option

5.8 Conclusions

5.A Proof of Theorem 8

5.B Main technical tools for the regret and pseudo-regret bounds

5.B.1 The main tool: a high probability bound on the standard deviations

5.B.2 Other important properties

5.B.3 Technical inequalities

5.C Proof of Theorem 9 and Proposition 4

5.C.1 Problem dependent bound on the number of pulls

5.C.2 Proof of Theorem 9

5.C.3 Proof of Proposition 4

5.D Proof of Theorems 10 and Proposition 5

5.D.1 Problem independent Bound on the number of pulls of each arm

5.D.2 Proof of Theorem 10

5.D.3 Proof of Proposition 5

5.E Comments on problem independent bound for GAFS-WL

5.F Proof of Propositions 6, 7 and 8

5.F.1 Proof of Proposition 6

5.F.2 Proof of Propositions 7 and 8

**6 Minimax Number of Strata for Online Stratified Sampling given Noisy Samples**

6.1 Setting

6.2 The quality of a partition: Analysis of the term Qn,N.

6.2.1 General comments

6.3 Algorithm MC-UCB and a matching lower bound

6.3.1 Algorithm MC − UCB

6.3.2 Upper bound on the pseudo-regret of algorithm MC-UCB.

6.3.3 Lower Bound

6.4 Minimax-optimal trade-off between Qn,NK and Rn,NK(AMC−UCB)

6.4.1 Minimax-optimal trade-off

6.4.2 Discussion

6.5 Numerical experiment: influence of the number of strata in the Pricing of an Asian option

6.A Proof of Theorem 16

6.A.1 The main tool: a high probability bound on the standard deviations

6.A.2 Main Demonstration

6.B Proof of Proposition 10

6.C Proof of Proposition 11

6.D Large deviation inequalities for independent sub-Gaussian random variables

**7 Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions **

7.1 Introduction

7.2 Setting

7.3 Discussion on the optimal asymptotic mean squared error

7.3.1 Asymptotic lower bound on the mean squared error, and comparison with the Uniform stratified Monte-Carlo

7.3.2 An intuition of a good allocation: Piecewise linear functions

7.4 The LMC-UCB Algorithm

7.4.1 Algorithm LMC-UCB

7.4.2 High probability lower bound on the number of sub-strata of stratum

7.4.3 Remarks

7.5 Main results

7.5.1 Asymptotic convergence of algorithm LMC-UCB

7.5.2 Under a slightly stronger Assumption

7.5.3 Discussion

7.A Numerical Experiments

7.B Poof of Lemma

7.C Proof of Lemmas

7.D Proof of Theorem

7.E Proof of Theorems

**8 Toward Optimal Stratification for Stratified Monte-Carlo Integration **

8.1 Introduction

8.2 Preliminaries

8.2.1 The function

8.2.2 Notations for a hierarchical partitioning

8.2.3 Pseudo-performance of an algorithm and optimal static strategies

8.2.4 Main result for algorithm MC-UCB and point of comparison

8.3 A first algorithm that selects the depth

8.3.1 The Uniform Sampling Scheme

8.3.2 The Deep-MC-UCB algorithm

8.3.3 Main result

8.4 A more efficient strategy: algorithm MC-ULCB

8.4.1 The MC-ULCB algorithm

8.4.2 Main result

8.4.3 Discussion and remarks

8.A Proof of Lemma

8.B Proof of Theorem

8.B.1 An interesting large probability event

8.B.2 Rate for the algorithm

8.B.3 Nodes that are in the final partition

8.B.4 Comparison at every scale

8.C Proof of Theorem

8.C.1 Some preliminary bounds

8.C.2 Study of the Exploration Phase

8.C.3 Characterization of the Nn

8.C.4 Study of the Exploitation phase

8.C.5 Regret of the algorithm

8.D Large deviation inequalities for independent sub-Gaussian random variables

II Compressed Sensing

**9 Compressed Sensing**

9.1 Introduction

9.2 Compressed Sensing in a nutshell

9.2.1 Setting

9.2.2 What is a good sampling scheme?

9.2.3 Transformation of the problem in a convex problem

9.2.4 The RIP property: a solution to the noisy setting and efficient ways to sample

9.2.5 Matrices that verify the RIP property

9.3 Conclusion

**10 Sparse Recovery with Brownian Sensing**

10.1 Introduction

10.2 Relation to existing results

10.3 The “Brownian sensing” approach

10.3.1 Properties of the transformed objects

10.3.2 Main result.

10.4 Discussion.

10.4.1 Comparison with known results

10.4.2 The choice of the curve

10.4.3 Examples of curves

10.5 Recovery with orthonormal basis and i.i.d. noise when the function f is Lipschitz 251

10.5.0.1 I.i.d. centered Gaussian observation noise

10.5.1 Discussion

10.6 Numerical Experiments

10.6.1 Illustration of the performances of of Brownian Sensing

10.6.2 The initial experiment of compressed sensing revisited

10.A Proofs

**11 Bandit Theory meets Compressed Sensing for high dimensional linear bandit**

11.1 Setting and a useful existing result

11.1.1 Description of the problem

11.1.2 A useful algorithm for Linear Bandits

11.2 The SL-UCB algorithm

11.2.1 Presentation of the algorithm

11.2.2 Main Result

11.3 The gradient ascent as a bandit problem

11.3.1 Formalization

11.4 An alternative algorithm when the noise is sparse

11.4.1 Presentation of the algorithm

11.4.2 Main Result

11.4.3 Numerical experiment

11.A Proofs

**References**