multi-lineage evolution in viral populations driven by host immune systems

Get Complete Project Material File(s) Now! »

Numerical simulations of stochastic processes

Some times one is able to define a stochastic model to describe a system under study, but the analytical progress that can be done on such model can be very limited. And other times it may even be impossible to write down equations from the set of basic ingredients defining the model. Fortunately there are several computational techniques that can help to study the model behavior and compare its prediction with the modeled phenomenon even in such cases.
First, from the differential equations we can directly find numerical ap-proximation to their solutions. Even for the Langevin equation (8) there is a generalization of the Euler method to stochastic differential equations, called Euler-Maruyama algorithm [68] as well as higher order methods. Another approach is to simulate directly the set of rules defining the model through a broad class of computational algorithms that rely on gener-ating (pseudo-)randomness and then sampling from it. These methods are called Monte Carlo. They were introduced and systematically used by Ulam and von Neumann while studying neutron diffusion at the Los Alamos Na-tional Laboratory during World War II. The name Monte Carlo was the code name of their work, secret at the time. It was inspired by the eponymous Casino in Monaco, and it was proposed by Metropolis because allegedly Ulam’s uncle “would borrow money from relatives because he just had to go to Monte Carlo” [127].
The idea underlying Monte Carlo is to reproduce the model dynamics by drawing samples from the corresponding probability distribution. In the first part of the thesis we will use Monte Carlo methods to simulate processes that are not necessarily at equilibrium nor at steady state. In the second half we will use this scheme to simulate a system at equilibrium drawing from the desired Boltzmann distribution using the Metropolis-Hastings algorithm, and then we will use a Markov chain Monte Carlo designed to reproduce the desired steady state distribution of an out-of-equilibrium system. Note that even if here we introduce these algorithms in the context of stochastic processes, their scope is broad enough that they can be used to tackle purely deterministic problems such as solving integrals, by virtue of the fact that for many i.i.d. random variables the sample average and the ensemble average converge due to the law of large numbers .
For a detailed introduction to Monte Carlo methods and an overview of many applications in physics and chemistry we refer the reader to check [5] More precisely, in Chapter 3, which is the direct copy of the published work in [115], we will study a model coupling viral evolution, epidemiologi-cal dynamics and immune memory by means of an agent based Monte Carlo simulation. This is a computational model that explicitly considers a great number of agents, in our case hosts and viral strains. It is based on a set of rules governing the interactions between these agents, for instance infections, immune update, mutations and selection, which define the microscopic in-gredients of the model, and in our case carry intrinsically random features. The algorithm advances the time evolution of the system simulating the si-multaneous “actions” and interactions of all of the components according to the few rules governing them. The goal is to study how these microscale dynamic interactions produce complex pattern in the system as a whole, in our case meaning at the population level.
The strength of this computational approach lies in the clarity and intu-itiveness of the microscopic ingredients of the model, which the modeler is free to gauge to attain the desired level of detail. Therefore agent based mod-els can be used to build accurate and realistic generative simulations of com-plex systems without the need to rely on many assumptions. The weakness of this approach lies in its high computational cost due to the huge num-ber of agents that need to be modeled explicitly, which severely limits its practical applications unless a sufficient amount of computational resources are available. This drawback is further stressed by the fact that the emergent behavior and the relative importance of stochasticity as a confounding factor depend strongly on the population size [113].
To overcome this limitation in studying the model behavior and scaling, as well as to be able to perform some analytical progress that may reveal some universal feature of the studied phenomenon, in Chapter 4 we study a more coarse-grained model consisting of a system of stochastic reaction-diffusion equations. These are Langevin equations of the form (8) where the random variable is an high-dimensional object describing the state of a whole popu-lation. To complement the analytics we study the model numerically with another kind of Monte Carlo simulation that implements the ingredients of the reaction-diffusion system on a discrete lattice, to extrapolate the relevant observable of this model: the population distribution over the lattice sites. This simulation is not agent based in the sense that we don’t explicitly simu-late all of the hosts and viral strains anymore, but only their relative fraction on each lattice site. More details are given in Chapter 4.

conceptual tools: theoretical models of evolution and epidemiology

As we mentioned in 2.1 the first part of the thesis will study theoretical models coupling processes at different scales: immune response, epidemi-ological spread of pathogens in host populations and evolution. Our per-spective is mainly centered on the latter aspect, therefore this introductory section is going to focus mainly on modeling evolution.
We will restrict our investigation to pathogens that produce acute infec-tions and elicit a strong immune response producing long-lasting immune memory. Hence in our modeling of evolutionary timescales the immune systems role at the individual level can be described in a very simple coarse-grained way, with immune memory building up deterministically based on the past history of pathogens infections. When looking at different relative timescales this approximation fails and one has to explicitly consider the stochastic process governing the adaptive immune system evolution in each individual, including the ecological competition of lymphocytes during in-fections. Since we will not consider these dynamics, this introduction will not cover these topics. For more information on how to build theoretical models of immune responses within individuals see [164] and [6].
In the following we give an example of how statistical mechanics can be used to model the evolution of populations. Then we introduce some con-cepts that are largely exploited in the literature of theoretical models for evolution, which will be central in the first part of the thesis. We conclude with a very short introduction to mean field epidemiological models.

Diffusion equations for populations evolution

The main forces driving evolution are mutations,genetic drift and selection— and sex/recombination, but for the most part this thesis will not consider this aspect, albeit extremely important in many situations. Mutations are changes in the genome of an organism that generate new variants called mutants, increasing the diversity of a population. These are intrinsically random events, as proven by the famous Luria-Delbruck experiment [112]. Genetic drift is the stochastic change of the frequency in a population of some mutants induced by the fact that populations consist of a finite num-ber of individuals. Selection is the process through which mutants that are fitter for the current environment produce more offspring than the others increasing their relative fraction in the population. This also carries some degree of stochasticity due to demographic noise, which becomes relevant when the number of individuals with a given mutation is small. Due to these various sources of randomness stochastic processes are a well suited framework to study the evolution of population diversity .
As an example let’s consider the Wright-Fisher model, where at each gen-eration the population is fixed to N individuals. The population is divided in two types, i individuals will be of type A and the rest of type B. In this simplified model there are no further mutations so from a generation to the next an individual will always produce individuals of the same type. At each generation t the offspring population is sampled randomly from the population at t – 1, and individuals of type A are sampled with probabil-ity i, which in the neutral (no selection) case reduces to the fraction of A, f = Ni . The population composition at time t is the result of N Bernoulli tri-als with probability i therefore the transition rates from a population state i to a state j is the Bernoulli distribution of having j successes out of the Bernoulli trials Nj ji(1 – i)(N-j). From this object we can write a Master equation of the form (2), therefore we are able to write the equations govern-ing the time evolution of the stochastic process starting from the microscopic definition of the model. The analytical treatment of the master equation is very hard, but it can be studies numerically through Markov Chain Monte Carlo simulations.

READ Transcriptomics of Prostate Cancer

From genotypes to phenotypes to fitness: cross-reactivity in recognition space

So far we have introduced mutations that generate diversity introducing mutants in the population, and selection that determines the relative success of different mutants in the population. But we haven’t specified in what space mutations act and what traits are selected.
The information regarding organism features is (partly) encoded in their genome, or genotype. This dictates the expression of proteins in cells via tran-scription and translation that in turn build up the phenotype of the organism. Actually phenotype is not entirely determined by genotype since there are many sources of noise and errors when translating DNA into proteins and in the proteins function. Even knowing the exact genome of an organism it’s very hard to predict its phenotype, a problem known as genotype-phenotype mapping. But in the context of evolution genotype is regarded as the main entity encoding information on phenotype, and mutations usually denote changes in the genome, also because only those changes are heritable and propagate through generations.

Table of contents :

1 modeling evolutionary constraints at different scales
1.1 Some philosophy (of science)
1.2 Two examples of constraints in evolution
1.3 Statistical mechanics offers a theoretical framework to study evolution
1.4 Thesis organization
i immune systems constrain the evolutionary paths of viruses
2 pathogens against immune systems, an arms race across timescales
2.1 Background and motivation
2.2 Technical tools: stochastic processes and numerical simulations
2.2.1 Markov processes
2.2.2 Fokker-Plank and Langevin equations
2.2.3 Numerical simulations of stochastic processes
2.3 Conceptual tools: theoretical models of evolution and epidemiology
2.3.1 Diffusion equations for populations evolution
2.3.2 From genotypes to phenotypes to fitness: cross-reactivity in recognition space
2.3.3 Evolution in structured and fluctuating fitness landscapes
2.3.4 Traveling wave theory of adaptation
2.3.5 Epidemiological models
3 multi-lineage evolution in viral populations driven by host immune systems
3.1 Abstract
3.2 Introduction
3.3 Methods
3.3.1 The model
3.3.2 Initial conditions and parameter fine-tuning
3.3.3 Detailed mutation model
3.4 Results
3.4.1 Modes of antigenic evolution
3.4.2 Stability
3.4.3 Phase diagram of evolutionary regimes
3.4.4 Incidence rate
3.4.5 Speed of adaptation and intra-lineage diversity
3.4.6 Antigenic persistence
3.4.7 Dimension of phenotypic space
3.4.8 Robustness to details of intra-host dynamics and population size control
3.5 Discussion
4 viruses phenotypic diffusion: escaping the immune systems chase
4.1 Introduction
4.1.1 From the microscopic model to Langevin equations
4.1.2 Simplified description
4.1.3 Deterministic fixed points
4.2 Phenomenological model in phenotypic space
4.2.1 Fitness function
4.2.2 System’s scales
4.3 Numerical simulations
4.3.1 Implementation
4.3.2 Observables estimation — clustering analysis
4.3.3 Preliminary numerical results
4.4 Wave solution
4.4.1 Regulation of population size
4.4.2 Traveling wave scaling in phenotypic space
4.5 Adding other dimensions to the linear wave
4.5.1 Shape of viral dispersion
4.5.2 Lineage trajectory diffusivity in antigenic space
4.6 Conclusions and near future directions
ii infer evolutionary constraints at finer scales: proteins, evolution and statistical physics
5 statistical physics for protein sequences
5.1 Background and motivation
5.2 Statistical mechanics, inference and protein sequences
5.2.1 Canonical ensemble
5.2.2 Maximum Likelihood
5.2.3 Maximum Entropy principle and inverse Potts problem
5.3 Parameters and optimization
5.3.1 Boltzmann learning
5.3.2 Gauge invariance and regularization
5.4 General applications of DCA
5.5 Repeat proteins families
5.5.1 Repeat proteins
5.5.2 Global ensemble features of repeat proteins sequence space
5.5.3 Making sense of empirical patterns: repeats evolutionary model
6 size and structure of the sequence space of repeat proteins
6.1 Abstract
6.2 Introduction
6.3 Results
6.3.1 Statistical models of repeat-protein families
6.3.2 Statistical energy vs unfolding energy
6.3.3 Equivalence between two definitions of entropies
6.3.4 Entropy of repeat protein families
6.3.5 Effect of interaction range
6.3.6 Multi-basin structure of the energy landscape
6.3.7 Distance between repeat families
6.4 Discussion
7 evolutionary model for repeat arrays
7.1 Introduction
7.2 Model
7.2.1 Parameters inference
7.3 Results
7.4 Exploring mechanisms behind duplications and deletions
7.4.1 Multi-repeat duplications and deletions
7.4.2 Similarity dependent duplications and deletions
7.4.3 Asymmetric similarity dependence between duplications and deletions
7.5 The road ahead
7.5.1 Duplications bursts model
7.6 Conclusions
iii conclusions and future perspectives
8 concluding remarks
8.1 Discussion and conclusion
8.2 Future perspectives
8.2.1 Viral-immune coevolution
8.2.2 Protein evolution
a multi-lineage evolution in viral populations driven by host immune systems: supplementary information
a.1 Simulation details
a.1.1 Initialization
a.1.2 Control of the number of infected hosts
a.2 Detailed mutation model
a.3 Analysis of simulations
a.3.1 Lineage identification
a.3.2 Turn rate estimation
a.3.3 Phylogenetic tree analysis
b size and structure of the sequence space of repeat proteins: supplementary information
b.1 Methods
b.1.1 Data curation
b.1.2 Model fitting
b.1.3 Models with different sets of constraints
b.1.4 Entropy estimation
b.1.5 Entropy error
b.1.6 Calculating the basins of attraction of the energy landscape
b.1.7 Kullback-Leibler divergence
c evolutionary model for repeat arrays – supplementary information
c.1 Dataset
c.2 Quasi-equilibrium
c.3 Numerical simulations
c.4 Parameters learning
c.5 Energy gauge for contacts prediction
c.6 Similarity dependent dupdel rates
c.6.1 Asymmetric duplications and deletions
c.7 Duplication bursts rates from model definition
bibliography