Comparing the Replacement Probability to the Prediction Probability

Get Complete Project Material File(s) Now! »

The R-Boot Algorithm

A direct implementation of the replacement Bootstrap requires an estimate of the probability,nP(a1; : : : ; aRjXT n fXt1 ;Xt2 ; : : : ;XtRg); which corresponds to the probability of a word v = (a1; : : : ; aR) in R random locations t1; : : : ; tR, conditioned on the remainder of the sequence, i.e., the portion of the sequence that is not replaced. Unfortunately, this requires estimating probabilities for an alphabet of size jAjR conditioned on different subsets of the sequence, which would rapidly become infeasible as R increases. Therefore, we propose a sequential process based on the estimation of the one-symbol replacement probability, P(jX<t;X>t);
which results in a feasible and much more efficient procedure. Although a variety of methods can be used to estimate the conditional distribution, the algorithm presented here, R-Boot, adapts a universal predictor (see e.g., Ryabko [2010]) to estimate the one-symbol conditional replacement distribution. Relying on a universal predictor to compute a consistent estimate of P(jX<t;X>t) allows us to consider the wide class of stationary–ergodic processes and avoid restrictive model and/or parametric assumptions on the process P. Definition 1. A measure is a universal predictor if for any stationary and ergodic process P,nwhere the expectation is w.r.t. P. Several predictors with this property are known. Although a predictor could be directly used to replace a symbol Xt by drawing a new symbol from the pre24 Bootstrap diction distribution (jX<t), we show that a universal predictor can be adapted to estimate the conditional distribution P(jX<t;X>t). Thus, more accurate replacements can be achieved by taking into consideration not only the past X<t, but also the future X>t. R-Boot builds on the universal measure R [Ryabko, 1988, 2008], which unlike other universal predictors (e.g., the Ornstein predictor [Ornstein, 1978]) or compression-based predictors (e.g., Ryabko and Monarev [2005], Ryabko [2009]), is both resource and data efficient. It is defined as a combination of varying order Krichevsky predictors [Krichevsky, 1968] that only consider X<t (e.g., the past) to estimate the prediction probability P(jX<t) and is defined in Definition 2.

Impact of Replacement Parameter R

The replacement parameter R has the same expected impact on iterative replacements as it has on simultaneous replacements. The number of replacements R defines how much of the original sequence is preserved, where small R increases bias for the original sequence and large R favors greater variance. In particular, the temporal structure of the original sequence is preserved for smaller values of R, while larger values of R increase variability and noise. In general, the incremental nature of R-Boot requires large R (generally greater than T) to compensate for the iterative (versus simultaneous) nature of replacements. Though unlikely, the random selection of replacement points might result in repeated selection of the same position in the series. This can cause problems in that the aim is to cover the series and not to localize replacements to a single neighborhood. Conversely, we found that some repeated selection actually improved performance by introducing variability through the iterative process. For example, for Xtr , at step r, R-Boot uses the current sequence zr􀀀1 T to define the conditional probability RXT (jzr􀀀1nT ; zr􀀀1nT ). As a result, changes to the symbols Xt1 ; : : : ;Xtr􀀀1 in the original sequence may later trigger changes in other locations and this source of variability was a positive contributer to good performance. In testing, we found that removing duplicate replacement points caused a noticeable reduction in performance. As a result, it is important to retain random selection, while taking care to notice potential issues with small values of R.

Implementation details

This section simplifies Equation 2.4 to show how the conditional replacement distribution can be efficiently computed. Combining an infinite number of Krichevsky predictors in the measure R can be computed in polynomial time by setting the maximum Krichevsky model size to KT = O(log T). Estimates of order m only uses m samples in the past, so this considerably reduces the number of computations needed to estimate the measure R. This results in R replacements of order to generate a single Bootstrap sequence. First note that replacing infinity with any KT increasing to infinity with T does not affect the asymptotic convergence properties of the measure R. Next, frequency estimates in Km for m O(log T) are not consistent, so they only add noise to the estimate. Therefore, setting KT to O(log T) is meaningful, while also efficiently computable, in that it retains meaningful structure, while also significantly reducing computational complexity. We begin by first elaborating Equation 2.4. First, replacement points are drawn at random from a uniform distribution, i.e., t U([1; T]). The probability that a (new) sequence has symbol a 2 A in position t given that the rest of the sequence is equal to the original sequence XT is,

READ Chemical Vapor Deposition (CVD)

A Comparison with Markov Bootstrap

While the replacement Bootstrap is very different from the block Bootstrap and does not assume the Markov property, it does share similarities with the Markov Bootstrap. For example, the R-Boot algorithm uses Krichevsky predictors to estimate k-order Markov approximations of the process for k = 1; : : : ;O(log T). The measure R is then used to create a weighted combination of the KT predictors, implicitly adapted on the prediction performance of each predictor. The Krichevsky predictor only considers the past k values in estimating the prediction probability P(jX<t). More specifically, recall that the k-order Krichevsky predictor is computed as.

Theoretical Guarantees

A desirable property for an effective and consistent Bootstrap method is to produce an accurate distributional estimate from a finite sample sequence (i.e., P(XT )). Unfortunately, this is not possible in the case of stationary–ergodic processes. This is in stark contrast to the classes of processes considered in existing Bootstrap algorithms, where estimating P(X1:m) for a critical m T (e.g., m = k+1 in the case of k-order Markov processes) results in a sufficiently accurate estimate of P(X1:T ). While considering the general case of stationary–ergodic distributions significantly increases the applicability of the Bootstrap, it also prevents theoretical guarantees on the Bootstrap distribution estimate P(X1:T ). Moreover, it is provably impossible to establish any nontrivial rates of convergence for stationary–ergodic processes. Consequently, the following analysis will focus on asymptotic consistency guarantees for the individual replacements step in the R-Boot Bootstrap algorithm. At a high level, if the sequence length of at least one side of the replacement is sufficiently long, then, on average, the probability distribution for the inserted symbol converges to the true unknown probability distribution, of the symbol in that position, given the observations on both sides of the replacement, herein referred to as the past and future. Moreover, as the length of the sequence, both in the past and in the future, with respect to the replacement goes to infinity, the probability distribution over symbols approaches the double-sided entropy rate. Here, we introduce additional notation. A stationary distribution over one-way infinite sequences X1;X2; : : : can be uniquely extended to a distribution over twoway infinite sequences : : : ;X􀀀1;X0;X1; : : : . We assume this extension whenever necessary. Recall that for stationary processes, the k-order entropy rate can be written as, hk(P) = EX􀀀k:􀀀1 [h(X0jX􀀀k:􀀀1)].

Table of contents :

1 Introduction
2 The Replacement Bootstrap
1 Introduction
2 Preliminaries
3 The Bootstrap
3.1 Block Bootstrap
3.2 Markov Bootstrap
4 The Replacement Bootstrap
4.1 Comparing the Replacement Probability to the Prediction Probability
4.2 The R-Boot Algorithm
4.3 Impact of Replacement Parameter R
4.4 Implementation details
4.5 A Comparison with Markov Bootstrap
4.6 Theoretical Guarantees
5 Empirical Evaluation
5.1 Currency Datasets
6 Conclusions
7 Future Work
3 Risk Averse Multi-Arm Bandits
1 Introduction
2 The Multi–Arm Bandit Problem
2.1 Notation, Setting and Definitions
2.2 Optimism in the Face of Uncertainty Principle
3 Mean–Variance Multi–arm Bandit
3.1 Additional Notation, Setting and Definitions
4 Mean–Variance Lower Confidence Bound Algorithm
4.1 Theoretical Analysis
4.2 Worst–Case Analysis
5 Exploration–Exploitation Algorithm
5.1 Theoretical Analysis
6 Numerical Simulations
7 Sensitivity Analysis
8 Discussion
9 Conclusions
10 Subsequent Work
4 Online Learning with a Benchmark
1 Introduction
2 Preliminaries
2.1 Online Learning with Full Information
2.2 Prediction with Expert Advice
2.3 Weighted Majority Algorithms
3 Risk in Online Learning
3.1 Risk Sensitive Online Learning
3.2 Online Variance–Loss Minimization
3.3 Risk to the Best versus Risk to the Average
4 Online Learning with a flexible Benchmark
4.1 (A; B)-Prod
4.2 Discussion
5 Applications
5.1 Prediction with expert advice
5.2 Tracking the best expert
5.3 Online convex optimization
5.4 Learning with two-points-bandit feedback
6 Empirical Results
6.1 Settings
7 Conclusions
5 Conclusion
Bibliography