Quasi-Monte Carlo Variational Inference – Project topics materials

Get Complete Project Material File(s) Now! »

Tuning Of Hamiltonian Monte Carlo Within Sequential Monte Carlo

We now discuss the tuning of the Markov kernel in line 6 of Algorithm 12. The tuning of Markov kernels within SMC samplers can be linked to the tuning of MCMC kernels in general. One advantage of tuning the kernels in the framework of SMC is that information on the intermediate distributions is available in form of the particle approximations. Moreover, different kernel parameters can be assigned to different particles and hence a large number of parameters can be tested in parallel. This idea has been exploited by Fearnhead and Taylor (2013). We build upon their methodology and adjust their approach to the tuning of HMC kernels.
We first describe the tuning of the mass matrix. Second, we present our adaptation of the approach of Fearnhead and Taylor (2013) to the tuning of HMC kernels, abbreviated by FT. Then we present an alternative approach based on a pre-tuning phase, abbreviated by PR for preliminary run. Finally, we discuss the comparative advantages and drawbacks of the two approaches.

Tempering from an isotropic Gaussian to a shifted correlated Gaussian

As a first toy example we consider a tempering sequence that starts at an isotropic Gaussian p0 = N(0d, Id), and finishes at a shifted and correlated Gaussian pT = N(m, X), where m = 2 1d, for different values of d. For the covariance we set the off-diagonal correlation to 0.7 and the variances to elements of the equally spaced sequence ˜X = [0.1, , 10]. We get the covariance X = diag ˜X1/2 corr(X) diag ˜X 1/2. This toy example is rather challenging due to the different length scales of the variance, the correlation and the shifted mean of the target. In this example the true mean, variance and normalizing constants are available. Therefore we report the mean squared error (MSE) of the estimators. We use normalized importance weights and thus ZT/Z0 = 1.
We compare the following SMC samplers: MALA, HMCAFT and HMCAPR (according to the denomination laid out in the previous section). We add to the comparison HMCNFT, an SMC sampler using adaptive (FT based) HMC steps, but where the sequence of temperatures is fixed a priori to a long equi-spaced sequence (the size of which is set according to the number of temperatures chosen adaptively during on run of HMCAFT).

Tempering from a Gaussian to a mixture of two correlated Gaussians

The aim of our second example is to assess the robustness of SMC samplers with respect to multimodality. We temper from the prior p0 = N(m0, 5Id) towards a mixture of shifted and correlated Gaussians pT = 0.3N(m, X1) + 0.7N(􀀀m, X2), where m = 4 1d and we set the off diagonal correlation to 0.7 for X1 and to 0.1 for X2. The variances are set to elements of the equally spaced sequence ˜X j = [1, , 2] for j = 1, 2. The covariances Xj are based on the same formula as in the first example. In order to make the example more challenging we set m0 = 1d. Thus, the start of the sampler is slightly biased towards one of the two modes. We evaluate the performance of the samplers by evaluating the signs of the particles and use therefore the statistic Ti := 1/dådj =1 1 fsign Xj,i=+1g, based on a single particle. We expect a proportion of 30% of the signs to be positive, i.e. 1/N åN i=1 Ti 0.3, if the modes are correctly recovered.
Our measure of error is based on the squared deviation from this value. We consider the following SMC samplers: MALA, RW, HMCAFT, and HMCAPR. All the samplers choose adaptively the number of move steps and the temperatures.
As shown by Figure 5.4b the two HMC-based samplers yield a lower error of the recovered modes adjusted for computation in moderate dimensions. In terms of the recovery of the modes all samplers behave comparable as illustrates Figure 5.4a. Nevertheless, as the dimension of the problem exceeds 20 all samplers tend to concentrate on one single mode. This problem may be solved by initializing the sampler with a wider distribution. However, this approach relies on the knowledge of the location of the modes.

READ Mathematical preliminaries for isogeny-based cryptography

Table of contents :

Acknowledgements
1 Introduction (in French)
1.1 Inférence bayésienne
1.2 Échantillonnage Monte Carlo
1.2.1 Echantillonnage indépendant
1.2.2 Échantillonnage dépendant
1.3 Quasi-Monte Carlo
1.3.1 Séquences Halton
1.3.2 Convergence de l’échantillonnage QMC
1.3.3 Quasi-Monte Carlo randomisé
1.3.4 Utilisation de séquences à discrépance faible en statistique
1.3.5 Théorème centrale limite pour QMC
1.4 Approximation stochastique
1.5 Inférence variationnelle
1.5.1 Inférence variationnelle par champ moyen
1.5.2 Inférence variationnelle par Monte Carlo
1.6 Résumé substantiel
1.6.1 Résumé substantiel de notre travail sur le quasi-Monte Carlo et le calcul bayésien approximatif
1.6.2 Résumé substantiel de notre travail sur le quasi-Monte Carlo et l’inférence variationnelle
1.6.3 Résumé substantiel de notre travail sur le réglage adaptatif du Monte Carlo hamiltonien dans le Monte Carlo séquentiel
2 Introduction
2.1 Bayesian inference
2.2 Monte Carlo Sampling
2.2.1 Independent Sampling
2.2.2 Dependent Sampling
2.3 Quasi Monte Carlo
2.3.1 Halton sequences
2.3.2 Convergence of QMC sampling
2.3.3 Randomized quasi-Monte Carlo
2.3.4 Using low discrepancy sequences in statistics
2.3.5 Central limit theorems for QMC
2.4 Stochastic approximation
2.5 Variational Inference
2.5.1 Mean Field Variational Inference
2.5.2 Monte Carlo Variational Inference
2.6 Summary 2.6.1 Summary of our work on quasi-Monte Carlo and approximate
Bayesian computation
2.6.2 Summary of our work on quasi-Monte Carlo and variational inference
2.6.3 Summary of our work on adaptive tuning of Hamiltonian Monte Carlo within sequential Monte Carlo
3 Improving ABC via QMC
3.1 Introduction
3.2 Approximate Bayesian computation
3.2.1 Reject-ABC
3.2.2 Pseudo-marginal importance sampling
3.3 Quasi-Monte Carlo
3.3.1 Randomized quasi-Monte Carlo
3.3.2 Mixed sequences and a central limit theorem
3.4 Improved ABC via (R)QMC
3.4.1 Improved estimation of the normalization constant
3.4.2 Improved estimation of general importance sampling estimators
3.5 Numerical examples
3.5.1 Toy model
3.5.2 Lotka-Volterra-Model
3.5.3 Tuberculosis mutation
3.5.4 Concluding remarks
3.6 Sequential ABC
3.6.1 Adaptive importance sampling
3.6.2 Adapting the proposal qt
3.6.3 Adapting simultaneously and the number of simulations per parameter
3.7 Numerical illustration of the sequential procedure
3.7.1 Toy model
3.7.2 Bimodal Gaussian distribution
3.7.3 Tuberculosis mutation
3.8 Conclusion
3.9 Appendix
3.9.1 Proofs of main results
4 Quasi-Monte Carlo Variational Inference
4.1 Introduction
4.2 RelatedWork
4.3 Quasi-Monte Carlo Variational Inference
4.3.1 Background: Monte Carlo Variational Inference
4.3.2 Quasi-Monte Carlo Variational Inference
4.3.3 Theoretical Properties of QMCVI
4.4 Experiments
4.4.1 Hierarchical Linear Regression
4.4.2 Multi-level Poisson GLM
4.4.3 Bayesian Neural Network
4.4.4 Increasing the Sample Size Over Iterations
4.5 Conclusion
4.6 Additional Information on QMC
4.7 Proofs
4.7.1 Proof of Theorem 5
4.7.2 Proof of Theorem 6
4.7.3 Proof of Theorem 7
4.8 Details for the Models Considered in the Experiments
4.8.1 Hierarchical Linear Regression
4.8.2 Multi-level Poisson GLM
4.8.3 Bayesian Neural Network
4.9 Practical Advice for Implementing QMCVI in Your Code
5 Tuning of HMC within SMC
5.1 Introduction
5.2 Background
5.2.1 Sequential Monte Carlo samplers
5.2.2 Hamiltonian Monte Carlo
5.3 Tuning Of Hamiltonian Monte CarloWithin Sequential Monte Carlo
5.3.1 Tuning of the mass matrix of the kernels
5.3.2 Adapting the tuning procedure of Fearnhead and Taylor (2013) .
5.3.3 Pretuning of the kernel at every time step
5.3.4 Discussion of the tuning procedures
5.4 Experiments
5.4.1 Tempering from an isotropic Gaussian to a shifted correlated Gaussian
5.4.2 Tempering from a Gaussian to a mixture of two correlated Gaussians
5.4.3 Tempering from an isotropic Student distribution to a shifted correlated Student distribution
5.4.4 Binary regression posterior
5.4.5 Log Gaussian Cox model
5.5 Discussion