Improving approximate Bayesian computation via quasi-Monte Carlo Joint work with Nicolas Chopin, to appear in Journal of Computational and Graphi-cal Statistics
Abstract: ABC (approximate Bayesian computation) is a general approach for dealing with models with an intractable likelihood. In this work, we derive ABC algorithms based on QMC (quasi-Monte Carlo) sequences. We show that the resulting ABC estimates have a lower variance than their Monte Carlo counter-parts. We also develop QMC variants of sequential ABC algorithms, which pro-gressively adapt the proposal distribution and the acceptance threshold. We illus-trate our QMC approach through several examples taken from the ABC literature.
Keywords: Approximate Bayesian computation, Likelihood-free inference, Quasi-Monte Carlo, Randomized Quasi-Monte Carlo, Adaptive importance sampling
Since its introduction by Tavaré et al. (1997) approximate Bayesian computation (ABC) has received growing attention and has become today a major tool for Bayesian infer-ence in settings where the likelihood of a statistical model is intractable but simula-tions from the model for a given parameter value can be generated. The approach of ABC is as convincing as intuitive: We first sample a value from the prior distribu-tion, conditional on this prior simulation an observation from the model is generated. If the simulated observation is sufficiently close to the observation that has been ob-served in nature, we retain the simulation from the prior distribution and assign it to the set of posterior simulations. Otherwise the simulation is discarded. We repeat this procedure until enough samples have been obtained.
Since then several computational extensions related to ABC have been proposed. For instance the use of MCMC as by Marjoram et al. (2003) has improved the sim-ulation of ABC posterior samples over the simple accept–reject algorithm. The use of sequential approaches by Beaumont et al. (2009), Sisson et al. (2009), Del Moral et al. (2012) and Sedki et al. (2012) made it possible to exploit the information from previous iterations and eventually to choose adaptively the schedule of thresholds e. Besides the question of an efficient simulation of high posterior probability regions, the choice of summary statistics, summarizing the information contained in the ob-servation and the simulated observation, has been investigated (Fearnhead and Pran-gle, 2012). See Marin et al. (2012) and Lintusaari et al. (2017) for two recent reviews. Moreover, the introduction of more machine learning driven approaches like random forests (Marin et al., 2016), Gaussian processes (Wilkinson, 2014), Bayesian optimiza-tion (Gutmann and Corander, 2016), expectation propagation (Barthelmé and Chopin, 2014) and neural networks (Papamakarios and Murray, 2016) have been proposed. A post-processing approach was studied in Blum (2010).
In this paper we take a different perspective and approach the problem of reduc-ing the variance of ABC estimators. We achieve this by introducing so called low discrepancy sequences in the simulation of the proposal distribution. We show that this allows to reduce significantly the variance of posterior estimates.
The rest of the paper is organized as follows. Section 3.2 reviews the basic ideas of approximate Bayesian computation and sets the notation. Section 3.3 introduces the concept of low discrepancy sequences. Section 3.4 brings the introduced concepts to-gether and provides the theory that underpins the proposed idea. Section 3.5 presents a first set of numerical examples. Section 3.6 explains how to use our ideas in a sequen-tial procedure which adapts progressively the proposal distribution and the value of e. Section 3.6 illustrates the resulting sequential ABC procedure. Section 5.5 concludes.
Approximate Bayesian computation
Approximate Bayesian computation is motivated by models such that (a) the likeli-hood function is difficult or expensive to compute; (b) simulating from the model (for a given parameter q) is feasible.
The most basic ABC algorithm is called reject-ABC. It consists in simulating pairs (q, y), from the prior p(q) and the likelihood p(yjq), and keeping those pairs such that d(y, y?) e, where y? is the actual data, and d : Y Y ! R+ is some distance (e.g. Euclidean). The sampling is continued until a number of N samples have been proposed. The target density of this rejection algorithm is:
p (q, y) = 1 p(q)p(yjq)1 fd(y, y?) eg , where Pq denotes a probability with respect to y p(yjq), and Ze = RQ p(q)Pq (d(y, y?) e) dq is the normalising constant.
As e ! 0, (3.1) converges to the true posterior density. Actually, d is often not a distance but a pseudo-distance of the form: d(y, y?) = ks(y) s(y?)k2, where k k2 is the Euclidean norm, and s(y) is a low-dimensional, imperfect summary of y. In that case, pe(q) ! p(qjs(y?)). This introduces an extra level of approximation, which is hard to assess theoretically and practically. However, in this paper we focus on how to approximate well (3.1) for a given d (and e), and we refer to e.g. Fearnhead and Prangle (2012) for more discussion on the choice of d or s.
Randomized quasi-Monte Carlo
A drawback of QMC is that it does not come with an easy way to assess the approxi-mation error.
FIGURE 3.1: Uniform random (left) and Halton (right) point set of length 256 in [0, 1]2. The Halton sequence covers the target space more evenly than the random uniform sequence.
RQMC (randomized quasi-Monte Carlo) amounts to introduce randomness in a QMC sequence, in such a way that un U [0, 1]d , marginally. The quantity IˆN =
N 1 ånN=1 y(ui) then becomes an unbiased estimate of the integral of interest. One may assess the approximation error by computing the empirical variance over re-peated simulations.
The simplest way to obtain an RQMC sequence is to randomly shift a QMC se-quence: Let v U [0, 1]d , and u1:N a QMC sequence; then
uˆn := un + v mod 1 (component wise)
is an RQMC sequence.
A more sophisticated approach, called scrambled nets, was introduced by Owen (1997) and later refined in Owen (2008). The main advantage of this approach is that under the assumption of smoothness of the derivatives of the function, the speed of convergence can be even further improved, as stated in the following Theorem.
Theorem 1 Owen (2008) Let f : [0, 1]d ! R be a function such that its cross partial derivatives up to order d exist and are continuous, and let (un)n21:N be a relaxed scrambled (l, t, m, d)-net in base b with dimension d with uniformly bounded gain coefficients. Then, N n=1 ! O
Var å f (un) = N 3 log(N)(d 1) ,
where N = lbm.
In words, 8t > 0 the RQMC error rate is O(N 3/2+t) when a scrambled (l, t, m, d)-net is used. This result has the only inconvenience that the rate of convergence only holds for certain N. However, a more general result has recently been shown by Gerber (2015)[Corollary 1], where if f 2 L2 and (un)n21:N is a scrambled (t, d)-sequence, then
1 åkN=1 Yk, and
Var å f (un) = o N 1 .
The construction of scrambled nets and sequences is quite involved. As the focus of our paper is the application and not the construction of these sequences, we refer the reader for more details to L’Ecuyer (2016) or Dick et al. (2013). In the following, when speaking about an RQMC sequence, we will assume that this sequence is a scrambled (t, d)-sequence.
Mixed sequences and a central limit theorem
One drawback of low discrepancy sequences is that the speed of convergence deteri-orates with the dimension. In some situations, a small number of components con-tributes significantly to the variance of the target. One then might choose to use a low discrepancy sequence for those components and an ordinary Monte Carlo approach on the rest. This idea of using a mixed sequence is closely linked to the concept of ef-fective dimension, see Owen (1998). Based on the randomness induced by the Monte Carlo part a central limit theorem (CLT)
Improved ABC via (R)QMC
Recall that we described our ABC importance sampler as an algorithm that sam-ples pairs (qn, xn) from q(q)qq (x), where xn consists of datapoints generated from the model. In most ABC problems, using (R)QMC to generate the qn should be easy, but this should not be the case for the xn’s. Indeed, the simulator used to generate data-points from the model may be a complex black box, which may require a very large, or random, number of uniform variates. Thus, we contemplate from now on generating the qn’s using (R)QMC. That is, qn = G(un), where u1:N is a QMC or RQMC sequence, and G is a function such that G(U), U U [0, 1]d , is distributed according to the proposal q(q); and xnjqn qqn is a random variate. In other words, (qn, xn) is a mixed sequence.
We already know from the previous section that an estimate based on a mixed se-quence converges at the Monte Carlo rate, OP(N 1/2), but has a smaller asymptotic variance than the same estimate based on Monte Carlo. In fact, a similar result may be established directly for the actual variance. Let IˆN := ånN=1 j(qn, xn)/N be an em-pirical average for some measurable function j. For simplicity, we assume here that the qn’s are either random variates, or RQMC variates. That is, in both cases, qn q marginally. Then Var[IˆN ] = Var EfIˆN jq1:N g + E VarfIˆN jq1:N g (3.4)
The first term is O(N 1) when the qn’s are generated using Monte Carlo, and should be o(N 1) under appropriate conditions when the qn’s are an RQMC sequence. On the other hand, the second term is O(N 1) in both cases. As a corrolary, the vari-ance of IˆN is smaller when using a mixed sequence, for N large enough.
The point of the following sections is to generalize this basic result to various ABC estimates of interest.
Table of contents :
1 Introduction (in French)
1.1 Inférence bayésienne
1.2 Échantillonnage Monte Carlo
1.2.1 Echantillonnage indépendant
1.2.2 Échantillonnage dépendant
1.3 Quasi-Monte Carlo
1.3.1 Séquences Halton
1.3.2 Convergence de l’échantillonnage QMC
1.3.3 Quasi-Monte Carlo randomisé
1.3.4 Utilisation de séquences à discrépance faible en statistique
1.3.5 Théorème centrale limite pour QMC
1.4 Approximation stochastique
1.5 Inférence variationnelle
1.5.1 Inférence variationnelle par champ moyen
1.5.2 Inférence variationnelle par Monte Carlo
1.6 Résumé substantiel
1.6.1 Résumé substantiel de notre travail sur le quasi-Monte Carlo et le calcul bayésien approximatif
1.6.2 Résumé substantiel de notre travail sur le quasi-Monte Carlo et l’inférence variationnelle
1.6.3 Résumé substantiel de notre travail sur le réglage adaptatif du Monte Carlo hamiltonien dans le Monte Carlo séquentiel
2.1 Bayesian inference
2.2 Monte Carlo Sampling
2.2.1 Independent Sampling
2.2.2 Dependent Sampling
2.3 Quasi Monte Carlo
2.3.1 Halton sequences
2.3.2 Convergence of QMC sampling
2.3.3 Randomized quasi-Monte Carlo
2.3.4 Using low discrepancy sequences in statistics
2.3.5 Central limit theorems for QMC
2.4 Stochastic approximation
2.5 Variational Inference
2.5.1 Mean Field Variational Inference
2.5.2 Monte Carlo Variational Inference
2.6.1 Summary of our work on quasi-Monte Carlo and approximate Bayesian computation
2.6.2 Summary of our work on quasi-Monte Carlo and variational inference
2.6.3 Summary of our work on adaptive tuning of Hamiltonian Monte Carlo within sequential Monte Carlo
3 Improving ABC via QMC
3.2 Approximate Bayesian computation
3.2.2 Pseudo-marginal importance sampling
3.3 Quasi-Monte Carlo
3.3.1 Randomized quasi-Monte Carlo
3.3.2 Mixed sequences and a central limit theorem
3.4 Improved ABC via (R)QMC
3.4.1 Improved estimation of the normalization constant
3.4.2 Improved estimation of general importance sampling estimators
3.5 Numerical examples
3.5.1 Toy model
3.5.3 Tuberculosis mutation
3.5.4 Concluding remarks
3.6 Sequential ABC
3.6.1 Adaptive importance sampling
3.6.2 Adapting the proposal qt
3.6.3 Adapting simultaneously et and the number of simulations per parameter
3.7 Numerical illustration of the sequential procedure
3.7.1 Toy model
3.7.2 Bimodal Gaussian distribution
3.7.3 Tuberculosis mutation
3.9.1 Proofs of main results
4 Quasi-Monte Carlo Variational Inference
4.3 Quasi-Monte Carlo Variational Inference
4.3.1 Background: Monte Carlo Variational Inference
4.3.2 Quasi-Monte Carlo Variational Inference
4.3.3 Theoretical Properties of QMCVI
4.4.1 Hierarchical Linear Regression
4.4.2 Multi-level Poisson GLM
4.4.3 Bayesian Neural Network
4.4.4 Increasing the Sample Size Over Iterations
4.6 Additional Information on QMC
4.7.1 Proof of Theorem 5
4.7.2 Proof of Theorem 6
4.7.3 Proof of Theorem 7
4.8 Details for the Models Considered in the Experiments
4.8.1 Hierarchical Linear Regression
4.8.2 Multi-level Poisson GLM
4.8.3 Bayesian Neural Network
4.9 Practical Advice for Implementing QMCVI in Your Code
5 Tuning of HMC within SMC
5.2.1 Sequential Monte Carlo samplers
5.2.2 Hamiltonian Monte Carlo
5.3 Tuning Of Hamiltonian Monte CarloWithin Sequential Monte Carlo
5.3.1 Tuning of the mass matrix of the kernels
5.3.2 Adapting the tuning procedure of Fearnhead and Taylor (2013)
5.3.3 Pretuning of the kernel at every time step
5.3.4 Discussion of the tuning procedures
5.4.1 Tempering from an isotropic Gaussian to a shifted correlated Gaussian
5.4.2 Tempering from a Gaussian to a mixture of two correlated Gaussians
5.4.3 Tempering from an isotropic Student distribution to a shifted correlated Student distribution
5.4.4 Binary regression posterior
5.4.5 Log Gaussian Cox model