Overview of Gaussian Processes and derived models, applications to single and multi-objective optimization, and multi-fidelity modeling

Get Complete Project Material File(s) Now! »

An illustrative integration of Bayesian concepts into a model

In this section, in order to introduce some definitions and concepts with an illustrative example, the Bayesian concepts are used with a linear regression model. Consider a regression problem Preg defined by the couple of training inputs/outputs (X,y), the size of the data-set (number of observations) n, and the dimension of the input data (number of features) d. The maximum likelihood estimate procedure A linear regression model M with a basis function expansion and a Gaussian noise is defined as: y(x) = w⊺ϕ(x) + ϵ (2.1)
where w is the vector of parameters of size m, ϕ(x) is the vector of basis functions of size m such as polynomial or multivariate Gaussian basis functions, and ϵ is a white Gaussian noise with variance σ2 i.e. ϵ ∼ N (0, σ2). For the sake of simplicity and for illustrative purposes σ2 is assumed known. A likelihood function p(y|X,w, σ2, M) is defined as the distribution of the observations conditioned on the parameters of the model. Assuming independent and identically distributed (i.i.d) training data, the likelihood can be written as: p(y|X,w, σ2) = n N y(i)|w⊺ϕ(x(i)), σ2In (2.2) i=1.
where In is the identity matrix of size n and the mention of the dependence on the model M is dropped for notation simplicity. Maximizing this likelihood function with respect to the parameters of the model w yields to their estimation. This procedure is called a Maximum Likelihood Estimate (MLE): w = argmax n y(i) |w (x ) , σ2 In (2.3) N ˆ ⊺ϕ (i)

Review on approximate inference techniques

In the illustrative example used previously, the prior was conjugate to the likelihood which is computationally convenient. However, for more sophisticated priors/likelihoods, likelihood analytically not tractable. In that case, approximation approaches are used. In the next paragraphs, the main approximate inference methods are described. Maximum a Posteriori Due to the computational burden of the marginal likelihood, the Maximum A Posteriori (MAP) estimate considers only the mode of the posterior distribution. This is practical since the marginal likelihood does not depend on the parameters w. Hence, the MAP computation comes back to a simple optimization problem: wˆMAP = argmax p(y|X,w)p(w) (2.10) w p(y|X) = argmax p(y|X,w)p(w) w
However, the computationally appealing aspect of the MAP is overtaken by its point estimate nature. Indeed, as the MLE, the MAP is a point estimate, and consequently does not provide a measure of uncertainty and might results in over-fitting. Another critic of the MAP is the use of the mode, in fact, the mode is variant to reparametrization and is not a representative statistic unlike the median or the mean ([Murphy, 2012]).
Instead of considering a point estimate corresponding to the mode of the posterior (MAP), Laplace approximation ([De Bruijn, 1981; Tierney and Kadane, 1986]) provides an intuitive way to approximate the posterior with a distribution around its mode. To do so, a second order Taylor series expansion is performed around the mode wˆMAP of the energy function of the parameters e(w) = −log p(y,w|X): e e ˆ ˆ ⊺ e 1 ˆ ⊺ ∂2e(w) ˆ (w) ≈ (w) + (w −wMAP) ∇ (w)|wˆMAP + (w −wMAP) |wˆMAP (w −wMAP) 2 ∂w∂w⊺ (2.11) The first order gradient term is equal to zero when evaluated in the mode, therefore, the equation is simplified to: e (w) ≈ e ˆ 1 ˆ ⊺A ˆ (2.12) (w) + 2(w −wMAP) (w −wMAP)

Sparse Gaussian Processes

The major drawback in GP concerns the handling of large data-sets. In fact, the training and prediction using GPs involves the inversion of the Gram matrix, that is the covariance matrix of the whole data-set KXX ∈ Rn×n. This inversion has a cubic complexity O(n3), which rapidly becomes computationally overwhelming. To overcome this limit of GPs, Sparse Gaussian Processes (SGPs) consisting of low rank approximation of the covariance matrix KXX have been developped. SGPs augment the latent space with a set of inputs/outputs called inducing input-output variables. Specifically, a set of m << n inducing pair of input-output variables Z = z(1), . . . ,z(m) and u = f(Z) = u(1), . . . , u(m) are introduced in order to reduce the time complexity of GPs from O(n3) to O(nm2). Different approaches that have been developed to determine this sparse approximation are described in the next paragraphs.

READ ENERGY DETECTION IMPLEMENTATION USING USRP2 AND GNU RADIO

Table of contents :

List of figures
List of tables
1 Introduction
1.1 The challenges in the design of complex systems
1.2 Machine learning for the analysis and optimization of complex systems
1.3 Motivations and outline of the thesis
I Overview of Gaussian Processes and derived models, applications to single and multi-objective optimization, and multi-fidelity modeling
2 From Linear models to Deep Gaussian Processes
2.1 Bayesian modeling
2.1.1 An illustrative integration of Bayesian concepts into a model
2.1.2 Review on approximate inference techniques
2.2 Gaussian Processes (GPs)
2.2.1 Definitions
2.2.2 Sparse Gaussian Processes
2.2.3 Gaussian Processes and other models
2.3 Deep Gaussian Processes (DGPs)
2.3.1 Definitions
2.3.2 Advances in Deep Gaussian Processes inference
2.3.3 Applications of Deep Gaussian Processes
3 GPs applications to the analysis and optimization of complex systems
3.1 Non-stationary GPs
3.1.1 Direct formulation of non-stationary kernels
3.1.2 Local stationary covariance functions
3.1.3 Warped GPs
3.2 Bayesian Optimization (BO)
3.2.1 Bayesian Optimization Framework
3.2.2 Infill criteria
3.2.3 Bayesian Optimization with non-stationary GPs
3.3 Multi-objective Bayesian optimization
3.3.1 Definitions
3.3.2 Multi-Objective Bayesian Optimization with independent models
3.3.3 Multi-objective Bayesian Optimization taking into account correlation between objectives
3.4 Multi-Fidelity with Gaussian Processes
3.4.1 Multi-fidelity with identical input spaces
3.4.2 Multi-fidelity with variable input space parameterization
3.5 Conclusion
II Single and Multi-Objective Bayesian Optimization using Deep Gaussian Processes
4 BO with DGPs for Non-Stationary Problems
4.1 Bayesian Optimization using Deep Gaussian Processes
4.1.1 Training
4.1.2 Architecture of the DGP
4.1.3 Infill criteria
4.1.4 Synthesis of DGP adaptations proposed in the context of BO
4.2 Experimentations
4.2.1 Analytical test problems
4.2.2 Application to industrial test case: design of aerospace vehicles
4.3 Conclusion
5 Multi-Objective Bayesian Optimization taking into account correlation between objectives
5.1 Multi-Objective Deep Gaussian Process Model (MO-DGP)
5.1.1 Model specifications
5.1.2 Inference in MO-DGP
5.1.3 MO-DGP prediction
5.2 Computation of the Expected Hyper-Volume Improvement (EHVI)
5.2.1 Approximation of piece-wise functions with Gaussian distributions
5.2.2 Proposed computational approach for correlated EHVI
5.3 Numerical Experiments
5.3.1 Analytical functions
5.3.2 Multi-objective aerospace design problem
5.3.3 Synthesis of the results
5.4 Conclusions
III Multi-fidelity analysis
6 Multi-fidelity analysis using Deep Gaussian Processes
6.1 Multi-fidelity with identically defined fidelity input spaces
6.1.1 Improvement of Multi-Fidelity Deep Gaussian Process Model (MF-DGP)
6.1.2 Numerical experiments of the improved MF-DGP on analytical and aerospace multi-fidelity problems
6.2 Multi-fidelity with different input domain definitions
6.2.1 Multi-fidelity Deep Gaussian Process Embedded Mapping (MFDGP- EM)
6.2.2 The input mapping GPs
6.2.3 The Evidence Lower Bound
6.2.4 Numerical experiments on multi-fidelity problems with different input space domain definitions
6.2.5 Synthesis of the numerical experiments
6.2.6 Computational aspects of MF-DGP-EM
6.3 Conclusion
7 Conclusions and perspectives
7.1 Conclusions
7.1.1 Contributions on Bayesian optimization for non-stationary problems
7.1.2 Contributions on multi-objective Bayesian optimization with correlated objectives
7.1.3 Contributions on multi-fidelity analysis
7.2 Perspectives
7.2.1 Improvements and extensions of the framework BO & DGPs
7.2.2 Improvements and extensions of the proposed algorithm for MOBO with correlated objectives
7.2.3 Improvements and extensions for multi-fidelity analysis
7.2.4 Extensions of deep Gaussian processes to other problems in the design of complex systems
References