Get Complete Project Material File(s) Now! »

## The Complexity of the Linear Best-Arm Identification Problem

As reviewed in Sect. 2, in the MAB case the complexity of the best-arm identification task ischaracterized by the reward gaps between the optimal and suboptimal arms. In this section, we propose an extension of the notion of complexity to the case of linear best-arm identification. We begin with the derivation of a lower-bound on the sample complexity needed for the best arm identification, in Sect. 3.1. Then, to illustrate the stopping conditions and the properties of efficient sample strategies, in Sect. 3.2 we analyze the sampling design obtained in the case of an oracle with access to the parameter θ∗.

### Optimality criteria

Perhaps the most popular OED performance measure is given by the determinant criterion, or the D-Optimal Design, where the goal is to find the optimal proportions of allocations n1, . . . , nK which maximize the determinant of Axn, or equivalently, which minimize the determinant of the error covariance matrix, since (detA)−1 = det(A−1). The formulation of the relaxed D-Optimal Design as an optimization problem is minimize − log det.

Basically, this corresponds to minimizing the confidence region for the value of the parameter θ∗. In particular, as pointed out from the earlier works [Silvey, 1972, Sibson, 1972] there is a duality relationship between the D-Optimal Design and the problem of finding minimal volume covering ellipsoids, the latter having applications in a wide range of areas (e.g., optimization, data analysis, computational geometry). To see this connection, notice that for a centre c ∈ Rd and a shape given by some positive definite matrix H, the ellipsoid E ∈ Rd, is defined as follows E(c,H) = {x ∈ Rd : (x − c)⊤H(x − c) ≤ d}. Now, we can rewrite H−1 = LL⊤ and introduce z = L−1(x − c). At this point, we can also redefine the ellipsoid as E(c,H) = {x = c + Lz : ||z|| ≤ √d}.

Finally, we remind that vol(E(c,H)) = |det L| · (√d)d · vol(hypersphered), thus vol(E(c,H)) = const(d)/√det H, and minimizing the volume of ellipsoid E is equivalent to minimizing −detH (and respectively −log detH). More details on the convex duality can be found, for instance, in [Todd, 2016].

In addition, insights from geometry can be used to identify and characterize the support points. More precisely, it is clear that only points at the margin of the convex hull of X are good candidates for support points. Also, if the support points are known, then, as pointed out in [Titterington, 1975], their corresponding allocation proportions in λ can be determined by using the following property.

#### Adaptive OED with Parameterized Heteroscedastic Noise

For the study of optimal design strategies in the linear stochastic bandits, we start by focusing on the G-optimal design, which is a worst-case strategy for the BAI problem in linear bandits. Nonetheless, the G-optimal strategy considered previously in the BAI chapter was static, and thus the strategy acted blindly, improving the accuracy of the estimation uniformly over all dimensions. We will now focus on designing adaptive G-allocation strategies, where the global optimality criterion needs to be reached in an environment where the noise in each arm will have a different, unknown variance. The behavior of an efficient sample strategy becomes in this case harder to design, since the choice of the support points, as well as the proportion of pulls allocated to each of them, will no longer depend uniquely on the geometry of the space. In fact now we have to first estimate the noise specific to each arm, then recompute the G-optimal allocation based on the current estimates. The regret of the learner will then be defined as the difference between his obtained loss and that of an optimal (oracle) allocation, that knows the arm variances beforehand.

The uniform estimation with limited budget and heteroscedastic noise has been recently studied in the bandit literature, but only for the multi-armed bandit (MAB), where the input space is the set of the orthogonal arms of the standard stochastic bandit. With the objective of estimating uniformly well the mean values of several distributions, the authors in [Antos et al., 2010] and [Carpentier et al., 2011] estimate the variance per arm and exploit heteroscedasticity to allocate more samples to the parts of the input space where the variance is larger. Our new formulation generalizes the MAB model by allowing the arms to be correlated and by taking into account the complete dimensionality of the problem at once. We study this problem in a specific reward model with heteroscedastic noise, as detailed in the following. We then propose some intuitions on how to design adaptive strategies for obtaining the corresponding G-optimal allocations.

**Table of contents :**

**Chapter 1 Introduction **

**Chapter 2 Preliminaries **

1 Stochastic Multi-armed Bandits

1.1 Formalization

1.2 Algorithms and Results

2 Stochastic Linear Bandits

2.1 Concentration Inequalities

2.2 Algorithms and Results

3 Performance Measures

3.1 Simple regret

3.2 OED inspired Optimality Criteria

3.3 Cumulative Regret in Multi-task Linear Bandits

4 Conclusion

**Chapter 3 Best-arm Identification in Linear Bandits **

1 Introduction

2 Preliminaries

2.1 The problem

2.2 The multi-armed bandit case

2.3 An Illustrative Example

3 The Complexity of the Linear BAI Problem

3.1 Lower Bound on the Sample Complexity

3.2 Geometrical Interpretation

4 Static Allocation Strategies

4.1 G-Allocation Strategy

4.2 XY-Allocation Strategy

4.3 Comparison between G-allocation and XY-allocation

5 Adaptive Allocation

5.1 XY-Adaptive Strategy

5.2 Length of the phases

5.3 Discarding uninteresting directions: M∗

5.4 Sample Complexity

6 Numerical Simulations

7 Conclusions

8 Appendix

8.1 Tools

8.2 Proof of Theorem 3.4

8.3 Discussion on M∗

**Chapter 4 Optimal Experimental Design in Linear Bandits **

1 Optimal Experimental Design

1.1 Assumptions and definitions

1.2 Optimality criteria

1.3 Applications

2 Adaptive OED with Parameterized Heteroscedastic Noise

2.1 The model

2.2 The Optimal Static Allocation Algorithm

2.3 GOM Algorithm

2.4 Numerical Simulations

2.5 Discussion

3 Adaptive OED with Heteroscedastic Noise

3.1 Sequential V-Optimal Design

3.2 Arm Indices for Sequential V-Optimal Design

3.3 V-Adaptive Algorithm

3.4 Future work on V-opt

4 Conclusion

5 Appendix

5.1 Optimal Allocation for Heteroscedastic G-loss

5.2 Additional Experiments for V-Adaptive

**Chapter 5 Sequential Transfer of Samples in Linear Bandits**

1 Introduction

2 Transfer of Samples in Linear Bandits

2.1 The Stochastic Linear Bandit Problem

2.2 Multi-task Linear Bandits

3 The Oracle Multi-Task Linear UCB

3.1 The MT-LinUCB Algorithm

3.2 Regret analysis

4 Multi-task Strategies

4.1 MT-UB Strategy

4.2 MT-TS Strategy

5 Experiments

6 Discussion

7 Appendix

7.1 Proofs of Section 2

7.2 Proofs of Section 3

7.3 Proofs of Section 4

**Chapter 6 Conclusion**

**Bibliography**