Comparison of the predictive distributions associated with the estimators (MLE and MAP) and the full posterior distribution

Get Complete Project Material File(s) Now! »

Reference priors for multiparametric models

Previously, we dealt with the case of one-parameter models. Naturally, in the case of multiparameter models, it is always possible to view the list of parameters as one big multidimensional parameter, so as to reduce the problem to the one already tackled. Such a choice often leads to unfortunate results however, in the sense that the obtained reference prior seems intuitively unsatisfactory. More precisely, there exist several multi-parameter models in the literature in which the reference prior, as defined above, has undesirable statistical properties. See Berger et al. [2015] for a large review of such situations. In fact, the authors state: “ We actually know of no multivariable example in which we would recommend the Jeffreys-rule prior. In higher dimensions, the prior always seems to be either ‘too diffuse’ […] or ‘too concentrated’ ”. And, as mentioned before, the reference prior is the Jeffreys-rule prior in regular cases.
Let 1 ::: r (r 2 Z+) be the parametric space. The “reference prior algorithm”, which was first developed by Bernardo [1979a], first requires an ordering of the parameters. Let us consider them ordered in the following way: 1 2 1 ::: r 2 r.

Smoothness of the correlation kernel

Lemma 2 of Berger et al. [2001] requires that correlation kernel and design set should be such that = 11> + g0()D + R0(), where 1 is the vector with n entries all equal to 1, g0() is a real-valued function such that lim!+1 g0() = 0, D is a fixed nonsingular matrix and R0 is a mapping from (0;+1) to the set of n n real matrices Mn such that lim!+1 k 1 g0()R0()k = 0.
What makes this assumption restrictive is the condition that D should be nonsingular, because it holds for rough correlation kernels only. For instance, as was noted by Paulo [2005], it does not hold for the Squared Exponential correlation kernel.
For a given correlation kernel K, D is typically a matrix proportional to the matrix with entries  x(i) 􀀀 x(j) q, where q depends on the smoothness of the correlation kernel but should in any case belong to the interval (0; 2]. This is because K(s)􀀀K(0) is equivalent to a constant times sq when s ! 0+. Schoenberg [1937] gives the following result (Theorem 4 in the original paper):
Theorem 3.3. If q 2 (0; 2), the quadratic form 2 Rn 7! Pn i;j=0  x(i) 􀀀 x(j) q  ij is nonsingular and its canonical representation contains one positive and n negative squares. This means that if the correlation kernel is rough enough to have q 2 (0; 2), the assumption that D is nonsingular is reasonable.
Corollary 3.4. The n n matrix with entries  x(i) 􀀀 x(j) q with q 2 ( 0 ; 2) is nonsingular and has one positive eigenvalue and n negative eigenvalues. The picture is dramatically different when the correlation kernel K is smooth enough to have q = 2. This happens as soon as K is twice continuously differentiable. Gower [1985]’s Theorem 6 implies the following results.

Optimal compromise: a general theory

In this section we introduce the concepts necessary to define the optimal compromise between potentially incompatible conditional distributions. First, note that in this context, “conditional distribution” is really an informal way of referring to a Markov kernel.
Definition 4.1. Let (A;A) and (B; B) be measurable sets. A mapping : A B ! [0; 1] is called a Markov kernel if:
1. for all x 2 A, (x; ) : B ! [0; 1] is a probability distribution and 2. for all S 2 B, (; S) : A ! [0; 1] is A-measurable. 97 4.2. OPTIMAL COMPROMISE: A GENERAL THEORY We use the following notation: for every (x; S) 2 A B, (Sjx) := (x; S). Let r be a positive integer and let ( 1;A1),…,( r;Ar) be measurable sets. Define = r i=1 i = 1 ::: r and A := Nr i=1 Ar = A1 ::: Ar. For every i 2 [j1; r]], let i be a Markov kernel j6=i j Ai ! [0; 1]. Intuitively (we formalize this below), every i should be assembled with a distribution mi on N j6=i Aj to create a “joint” distribution, that is a probability distribution on A. We refer to every mi (i 2 [[1; r]]) as an (r􀀀1)-dimensional distribution. If the mi can be chosen in such a way as to make all joint distributions equal, then the Markov kernels in the sequence (i)i2[[1;r]] are called compatible. And if no choice of (mi)i2[[1;r]] can make all joint distributions equal, we have to look for a “compromise” between the Markov kernels. Remark (Producing incompatibility is easy). Take r = 2 and 1 = 2 = R and endow R with the Borel sigma-algebra. Assume that for every t 2 R, 1(jt) and 2(jt) are absolutely continuous with respect to the Lebesgue measure and denote by f1(jt) and f2(jt) their respective density functions. A necessary condition [Arnold et al., 2001] for the compatibility of 1 and 2 is the existence of two mappings u and v defined on R such that for almost every real numbers x and t (in the sense of the Lebesgue measure),


Table of contents :

I Tools 
1 Kriging Overview 
1.1 Introduction
1.2 Gaussian random processes
1.3 Mean square continuity and differentiability of Gaussian processes
1.4 Spectral representation
1.5 Examples of correlation kernels
1.6 Current Kriging-related research
2 Reference Prior Theory 
2.1 Introduction
2.2 Basic idea
2.3 Full definition of the reference prior
2.4 Regular continuous case
2.5 Properties of reference priors
2.6 Examples
2.7 Reference priors for multiparametric models
3 Propriety of the reference posterior distribution in Gaussian Process regression
3.1 Introduction
3.2 Setting
3.3 Smoothness of the correlation kernel
3.4 Propriety of the reference posterior distribution
3.5 Conclusion
3.A Algebraic facts
3.B Maclaurin series
3.C Spectral decomposition
3.D Asymptotic study of the correlation matrix
3.E Details of the proof of Theorem 3.9
II Compromise 
4 Optimal compromise between incompatible conditional probability distributions
4.1 Introduction
4.2 Optimal compromise: a general theory
4.3 Discussion of the notion of compromise
4.4 Conclusion
5 Application of the Optimal Compromise to Simple Kriging models with Matérn correlation kernels 
5.1 Introduction
5.2 Optimal compromise between Objective Posterior conditional distributions in Gaussian Process regression
5.3 Comparisons between the MLE and MAP estimators
5.4 Comparison of the predictive distributions associated with the estimators (MLE and MAP) and the full posterior distribution
5.5 Conclusion and Perspectives
5.A Proofs of Section 5.2
6 A Comprehensive Bayesian Treatment of the Universal Kriging model with Matérn correlation kernels 
6.1 Introduction
6.2 Analytical treatment of the location-scale parameters
6.3 Reference prior on a one-dimensional
6.4 The Gibbs reference posterior on a multi-dimensional
6.5 Comparison of the predictive performance of the full-Bayesian approach versus MLE and MAP plug-in approaches
6.6 Conclusion
6.A Matérn kernels
6.B Proofs of the existence of the Gibbs reference posterior
7 Trans-Gaussian Kriging in a Bayesian framework: a case study 
7.1 Introduction
7.2 Probability Of Detection (POD)
7.3 An Objective Bayesian outlook to Trans-Gaussian Kriging
7.4 Industrial Application
7.5 Conclusion


Related Posts