Empirical Bayes analysis of spike and slab posterior distributions

Get Complete Project Material File(s) Now! »

Contributions using the Empirical Bayes method for the Spike and Slab prior

The following work, which is treated in more details in Chapters 2 and 3, was motivated by pursuing the work seen in 1.2.2 of Johnstone and Silverman (2004). Using the Empirical Bayes approach (1.2.14), they derived convergences at minimax rate for the posterior median and the posterior mean, as seen in Theorem 1.2.1, for suiting densities γ. Aiming at Uncertainty Quantification (which was later treated by Castillo and Szabo (2018) based on the present work), a natural question was to know if the second moment of the posterior law (1.2.15) behaved the same way. Namely, the form of the desired results is ˆ sup Eθ0 ∥θ − θ0∥22dΠαˆ(θ | X) ≤ Crn (1.2.24) θ0∈ℓ0[sn].
Suboptimality of the Laplace Slab. The first investigations were conducted with Γ taken as a standard Laplace distribution, and led to a quite surprising result. The posterior second moment for a Laplace Slab does not converge at minimax rate uniformly in θ ∈ ℓ0[sn], even though the posterior median and mean do so (as was proved by Johnstone and Silverman (2004) and noted above) Theorem 6. Let Πα be the Spike and Slab prior distribution (1.2.5) with Slab distribution Γ equal to the Laplace distribution Lap(1). Let Παˆ[· | X] be the corresponding plug-in posterior distribution given by (1.2.15), with αˆ chosen by the empirical Bayes procedure (1.2.14). There exist D > 0, N0 > 0, and c0 > 0 such that, for any n ≥ N0 and any sn with 1 ≤ sn ≤ c0n, there exists θ0 ∈ ℓ0[sn] such that, ˆ √ Eθ0 ∥θ − θ0∥22dΠαˆ[θ | X] ≥ Dsne log (n/sn).

Contribution using a Hierarchical approach with the Spike and Slab prior

An analog of the Spike and Slab prior In the following, one defines the cutoff Lmax = log2(n) and L the largest integer such that 2LL ≤ n (1.3.14) Note that L ≤ Lmax for every n. Let X(n) = (X1, · · · , Xn) be i.i.d. from law P with density f. Let Π be the prior on densities generated as follows. One keeps the P´olya tree random measure with respect to the canonical dyadic partition of [0, 1] construction up to level L, replacing the Beta distributions by ε∈E , Yε0 ∼(1−πε0)δ2 + πε0Beta(αε0, αε1), (1.3.15) with parametersl αε ∈ N to be chosen and a real parameter πε (later to be taken of the form 2− e−Cl, where we wrote l = |ε|).
There are multiple probability distributions on Borelians of [0, 1] that coincide on dyadic intervals Iε with P (Iε) resulting from the above construction. We consider the specific one that is absolutely continuous relatively to the Lebesgue measure on [0, 1] with a constant density on each Iε, |ε| = L + 1. So, both prior and posterior are histograms on dyadic intervals at depth L.
Definition. The prior distribution with parameters αε, πε, as above is called Spike and Slab P´olya tree and denoted Π(αε, πε).
This prior is based on an idea of Ghosal and van der Vaart, which is referred as Evenly Split P´olya tree in their book Ghosal and van der Vaart (2017). First note that the Haar coefficients flk of a density f can be expressed as flk = f, ψlk = 2 2 P(Iε)(1 − 2Yε0) (1.3.16).

Empirical Bayes estimation with spike and slab prior

In the setting of model (2.1.1), the spike and slab prior on θ with fixed parameter α ∈ [0, 1] is Πα ∼ ⊗in=1(1 − α)δ0 + αG(·), (2.2.1) where G is a given probability measure on R. We consider the following choices Lap(1) G = or Cauchy(1) where Lap(λ) denotes the Laplace (double exponential) distribution with parameter λ and Cauchy(1) the standard Cauchy distribution. Different choices of parameters and prior distributions are possible (a brief discussion is included below) but for clarity of exposition we stick to these common distributions. In the sequel γ denotes the density of G with respect to Lebesgue measure. By Bayes’ formula the posterior distribution under (2.1.1) and (2.2.1) with fixed α ∈ [0, 1] is Πα[· | X] ∼ ⊗in=1(1 − a(Xi))δ0 + a(Xi)GXi (·), (2.2.2).

Suboptimality of the Laplace slab for the complete EB posterior distribution

Theorem 13. Let Πα be the spike and slab prior distribution (2.2.1) with slab distribution G equal to the Laplace distribution Lap(1). Let Παˆ[· | X] be the corresponding plug-in posterior distribution given by (2.2.2), with αˆ chosen by the empirical Bayes procedure (2.2.6). There exist D > 0, N0 > 0, and c0 > 0 such that, for any n ≥ N0 and any sn with 1 ≤ sn ≤ c0n , there exists θ0 ∈ ℓ0[sn] such that, ˆ ∥θ − θ0∥2dΠαˆ[θ | X] ≥ Dsne√ . Eθ0 log (n/sn).
Theorem 13 implies that taking a Laplace slab leads to a suboptimal convergence rate in terms of the posterior squared L2–moment. This result is surprising at first, as we know by (2.2.8) that the posterior median converges at optimal rate rn. The posterior mean also converges at rate rn uniformly over ℓ0[sn], by Theorem 1 of Johnstone and Silverman (2004). So at first sight it would be quite natural to expect that so does the posterior second moment.
One can naturally ask whether the suboptimality result from Theorem 13 could come from considering an integrated L2–moment, instead of simply asking for a posterior convergence result in probability, as is standard in the posterior rates literature following Ghosal et al. (2000). We now derive a stronger result than Theorem 13 under the mild condition sn ≳ log2 n. The fact that the result is stronger follows from bounding from below the integral in the display of Theorem 13 by the integral restricted to the set where ∥θ − θ0∥2 is larger than the target lower bound rate.
Theorem 14. Under the same notation as in Theorem 13, if Πα is a spike and slab distribution with as slab G the Laplace distribution, there exists m > 0 such that for any sn with sn/n → 0 and log2 n = O(sn) as n → ∞, there exists θ0 ∈ ℓ0[sn] such that, as n → ∞, √ Eθ0 Παˆ ∥θ − θ0∥2 ≤ msne 2 log (n/sn) | X = o(1).

READ Imaging in visco-elastic media obeying a frequency power-law

Table of contents :

R´esum´e d´etaill´e xi
0.0.1 Analyse par bay´esien empirique de lois a posteriori Spike and Slab.
0.0.2 Constante exacte pour l’a posteriori Spike and Slab calibr´e par bay´esien empirique.
0.0.3 Estimation adaptative de densit´es par a priori arbres de P´olya Spike and Slab.
1 Introduction
1.1 General Frame : the non-parametric, frequentist Bayesian approach
1.1.1 The Bayesian approach
1.1.2 Frequentist Bayesian
1.1.3 High and Infinite Dimension Models
1.1.4 Tuning the parameters
1.2 Gaussian Sequence Model and Thresholding
1.2.1 Definition of the Model
1.2.2 Bayesian approach and the Spike and Slab Prior
1.2.3 Other choices of a priori laws
1.2.4 Exact constant
1.2.5 Contributions using the Empirical Bayes method for the Spike and Slab prior
1.3 Density Estimation and P´olya Trees
1.3.1 Definition of the Model
1.3.2 The P´olya Tree Prior
1.3.3 Contribution using a Hierarchical approach with the Spike and Slab prior
2 Empirical Bayes analysis of spike and slab posterior distributions
2.1 Introduction
2.2 Framework and main results
2.2.1 Empirical Bayes estimation with spike and slab prior
2.2.2 Suboptimality of the Laplace slab for the complete EB posterior distribution
2.2.3 Optimal posterior convergence rate for the EB spike and Cauchy slab
2.2.4 Posterior convergence for the EB spike and slab LASSO
2.2.5 A brief numerical study
2.2.6 Modified empirical Bayes estimator
2.2.7 Discussion
2.3 Proofs for the spike and slab prior
2.3.1 Notation and tools for the SAS prior
2.3.2 Posterior risk bounds
2.3.3 Moments of the score function
2.3.4 In-probability bounds for ˆα
2.3.5 Proof of Theorem 13
2.3.6 Proof of Theorem 15
2.3.7 Proof of Theorem 14
2.4 Technical lemmas for the SAS prior
2.4.1 Proofs of posterior risk bounds: fixed α
2.4.2 Proofs of posterior risk bounds: random α
2.4.3 Proofs on pseudo-thresholds
2.4.4 Proof of the convergence rate for the modified estimator
2.5 Proof of Theorem 16: the SSL prior
2.6 Technical lemmas for the SSL prior
2.6.1 Fixed α bounds
2.6.2 Random α bounds
2.6.3 Properties of the functions g0 and β for the SSL prior
2.6.4 Bounds on moments of the score function
2.6.5 In-probability bounds
3 Sharp asymptotic minimaxity of spike and slab empirical Bayes procedures
3.1 Introduction
3.1.1 Model
3.1.2 Posterior convergence at sharp minimax rate
3.1.3 Spike and Slab prior
3.1.4 Useful Thresholds
3.1.5 Empirical Bayes choice of α
3.2 Main result
3.2.1 Why it works
3.3 Proofs
3.3.1 Thresholds and Useful Bounds
3.3.2 Properties of g and moments of the score function
3.3.3 Bounds for posterior moments and fixed α
3.3.4 Risk bound for fixed α: proof of Proposition 5
3.3.5 Random α bounds
3.3.6 Undersmoothing
3.3.7 Oversmoothing
3.3.8 Proof of Theorem
4 Adaptive P´olya trees on densities using a Spike and Slab type prior
4.1 Introduction
4.1.1 Definition of a P´olya tree
4.1.2 Function spaces and wavelets
4.1.3 Spike and Slab prior distributions ’truncated’ at a certain level L.
4.2 Main results
4.2.1 An adaptive concentration result
4.2.2 A Bernstein Von Mises result
4.3 Proofs
4.3.1 Preliminaries and notation
4.3.2 Proof of Theorem 19
4.3.3 Proof of Theorem 20
4.3.4 Technical Lemmas
References