Model of evolution of postzygotic isolation in parapatry
The simple DM model with two bi-allelic loci has been used by Bank et al. (2012) in a parapatric case under an island continent model. They showed that the maximum rate of gene flow is limited by exogenous selection. When gene flow occurs and DMI are neutral, they cannot be maintained. On the other hand when selection occurs, DMI can evolve either by selection against immigrants or by selection against hybrids. In the first case, this involves exogenous selection on one locus involved in the DMI. In the second case, this involves endogenous selection due to incompatibility of the two parental genetic backgrounds. The authors also showed that the genetic architecture maximizing gene flow supported by a DMI was not the same in the two cases. In selection against immigrants, tightly linked DMI of any strength are favoured. Selection against hybrids favours the evolution of strong unlinked DMIs. In addition, if selection act against hybrids and the environment is homogenous, the order of mutations is important (mutation order speciation).
An older model was that of Navarro & Barton (2003a) who proposed that chromosomal rearrangements favour the evolution of RI between species, in the presence of gene flow when DM incompatibilities accumulate within these rearrangements. In line with their model, the authors have shown that differentiation between human and chimpanzee was twice as high in genes mapped in rearranged chromosomes compared to collinear chromosomes (Navarro & Barton 2003b). However, their results (not the model) were largely controversial, as one study did not find such differences (Zhang et al. 2004) and technical criticism of the approach of Navarro & Barton was made by others (e.g. Lu et al. 2003). Finally, one of the conclusions about the debate following these studies was that finding evidence for or against parapatric speciation “remains a fascinating but elusive goal” (Lu et al. 2003; Navarro et al. 2003).
Model of evolution in sympatry
Evolution of RI in sympatry requires individuals to adapt to divergent habitats within the same area and the underlying genetic polymorphisms to be different in each habitat. The evolution of RI through divergent ecological selection also requires the linkage disequilibrium between genes involved in postzygotic isolation and those involved in prezygotic isolation. The main difficulty is that recombination will break up the coupling between these postzygotic and prezygotic isolating factors (genes) at each generation (Felsenstein 1981). To understand how coupling can be maintained, Felsenstein (1981) proposed two models: a one allele and a 2 allele model. In the one allele model, a new allele becomes fixed in all populations and allows individuals to recognize each other and to mate preferentially with the same adapted genotype. In this model, homogamy is controlled by a single locus. In the 2 allele model, two alleles (A1 and A2) have to be fixed in the populations. These alleles will also favor homogamy so that individuals fixed for A1 will mate together and those fixed for A2 will mate together. In this model, divergence is possible only if the 2 alleles are associated differentially to habitat choice, which requires strong linkage disequilibrium. The second model is generally considered as more realistic (Felsenstein 1981) but some authors considered the first model more reasonable (Kirkpatrick & Ravigné 2002). Indeed, recombination does not play a homogenizing role in this case, so that RI can evolve more easily. Under the 2 allele model another way to establish associations between prezygotic and postzygotic factors is through pleiotropic interactions between alleles of habitat choice and of local adaptation (Rice 1984; Doebeli 1996). Such pleiotropic traits were called “magic traits’” (Servedio et al. 2011). One example of this kind of interaction may be the threespine stickleback (Nagel & Schluter 1998). However few other convincing examples exist in nature (Servedio & Noor 2003).
There are many other quantitative models of sympatric speciation. They usually rely on the variance of some quantitative traits that will promote variance in resource use through competition between individuals within a sympatric population (Dieckmann & Doebeli 1999). In this model, frequency dependent selection and disruptive selection favor extreme individuals, consuming unexploited resources. In these conditions, assortative mating may lead to RI between ecologically diverging subpopulations. In the model, if assortative mating depends on a trait unlinked to resource use, genetic drift is necessary to break linkage equilibrium between the assortative trait and the trait for resource use. This model was criticized since during resource use, competition for extreme resources arises (e.g. competition for small and big seeds) leaving more intermediate resources and then favoring intermediate individuals.
Finally although theory predicts that sympatric speciation may occur, model assumptions are numerous and their empirical validity remains contentious (Bolnick & Fitzpatrick 2007). Today, few convincing examples of sympatric speciation exist as the magnitude of RI between putatively incipient species is generally low (e.g. in Rhagoletis (Feder et al. 2005) or Tinema (Soria-Carrasco et al. 2014)) suggesting that local adaptation, rather than speciation, is at play, or more simply because the null hypothesis of build-up of RI in allopatry has not been tested or could not be rejected. For instance, in Rhagoletis it is likely that the divergence was initiated in allopatry, with one race having diverged in North America and the other in Mexico, the latter having accumulated chromosomal rearrangements that have been maintained upon secondary contact in the United States.
New methods to infer the history of speciation
The development of coalescent models based on gene genealogy allows a better estimation of population genetic parameters such as effective population size, migration rate and timing of divergence between populations (Pinho & Hey 2010). However, a first prerequisite before estimating demographic parameters is to obtain a null realistic model of divergence for the studied populations. Fitting phylogenetic tree-based models allows drawing explicit inferences about history. However, these models assume a simple bifurcating tree with no subsequent gene flow, which may be incorrect when populations are connected by gene flow (Edwards 2009). Solutions for this issue were proposed by Pickrell & Pritchard (2012) and Gautier & Vitalis (2013). The first model is an extension of Cavalli-Sforza and Edward approaches that estimates allelic frequencies based on a multivariate Gaussian model (Pickrell & Pritchard, 2012). Migration allows for population split and mixture among multiple populations and is represented as edges along a graph instead of a tree. However, divergence estimates based on a Gaussian model may be reliable only for recent divergence times. The method of Gauthier & Vitalis (2013) relies on a diffusion process (forward in time) in contrast to most phylogenetic approaches and also allows handling several populations at a time. However, the method of Pickrell & Pritchard (2012) does not allow explicit comparisons of alternative scenarios and works better for full genome sequences with outgroup data. Overall the two methods do not allow estimating parameters such as effective population size or migration rates.
Coalescent methods such as the one developed by Nielsen & Wakeley (2001) and Hey (2010) allow estimating demographic parameters, but assume either continuous gene flow (i.e. the Isolation with Migration model) or no gene flow. In addition, such models use a full likelihood approach, which is computationally intensive. Other recently developed methods used Hidden Markov Model (HMM) and full genome data to reconstruct the history of divergence. However, these methods are often limited to a few genomes or populations (eg. PSMC, Li & Durbin 2011). Methods based on admixture fraction (Liang & Nielsen 2014b) or admixture tract length (eg. Harris & Nielsen 2013; Liang & Nielsen 2014a; Sedghifar et al. 2015) are currently under development in human populations and should be soon applicable to natural populations, as full genome data become available. On the other hand, new methods such as Approximate Bayesian Computation (ABC) bypass the need to compute likelihoods (Csilléry et al. 2010) by making use of simulated data under a given historical scenario. The method simply compares a set of observed summary statistics with a set of simulated statistics under different scenarios (Beaumont et al. 2002; Beaumont 2010). These methods have been widely developed in recent years and used to infer the history of species or population divergence (Ross-Ibarra et al. 2008, 2009; Duvaux et al. 2011; Pettengill & Moeller 2012; Roux et al. 2013, 2014). ABC methods work as follows: first a set of different models are simulated based on prior distributions of a set of parameters corresponding to the tested model. Typically a set of around one million datasets (i) are computed under each model. Then for each simulation a set of summary statistics S(i) are computed and compared to the observed set of summary statistics (So) based on a distance, such as the Euclidian distance. When the distance is below a tolerance threshold (δ), the parameter value is accepted. Parameter values can then be adjusted by local linear regressions, logistic regressions or neural network layers, giving more weight to the simulated summary statistics that are closest to the observed data and allowing a posterior distribution then to be drawn. Ultimately, model selection is performed based on posterior probabilities and Bayes factor.
Other methods use information from the site frequency spectrum (SFS, the number of derived alleles within a population), which constitutes a full summary of the data and implies no information loss (Gutenkunst et al. 2009). When a pair of populations is compared under a given divergence scenario, then the Joint Site Frequency Spectrum (JSFS) is used. In general, data from an outgroup species is needed to polarize the JSFS (i.e. to determine proportions of shared ancestral alleles versus proportions of shared derived alleles). JSFS can be simulated under a given demographic scenario using diffusion2 approximation to the one-locus, two-allele, Fisher-Wright Model (Gutenkunst et al. 2009). The method of Gutenkunst et al. (2009) is primarily designed to work on few populations (one, two or three) with stable allele frequency changes at each generation and assumes independence of the polymorphisms, otherwise, the likelihood becomes a composite likelihood and bootstrap methods must be used to validate the accuracy of estimates (Gutenkunst et al. 2009). The method has been improved, to compare more complex divergence scenarios, taking into account the heterogeneity of migration rates along the genome and by improving the exploration of the likelihood landscape through the use of simulated annealing methods in addition to the Broyden-Fletcher-Goldfarb-Shanno (BGFS) (Tine et al. 2014). These methods, combined with genomic data, allow exploring the heterogeneity of migration rates along the genome in line with the view of semi- permeable barriers to gene flow (Harrison, 1989, Wu 2001; Harrison 2012; Harrison & Larson 2014) (box 1).
Heterogeneous genomic divergence
Recent genome wide studies have documented heterogeneous genomic divergence, corroborating the idea of barrier permeability. Some studies revealed a few regions of large size (Turner et al. 2005; Jones et al. 2012; Via et al. 2012; Martin et al. 2013) whereas others have identified multiple regions of smaller size spread across the genome (Michel et al. 2010; Renaut et al. 2013; Burri et al. 2015). These observations have led to the development of a verbal theory of “divergence hitchhiking” facilitating divergence with gene-flow (Via & West 2008; Via 2012). This theory states that divergent selection and non-random mating will reduce recombination in the face of gene flow and generate large islands of differentiation. The four steps of the model are shown in Fig 5 from (Feder et al. 2012a).
Studying speciation: selection vs endogenous barriers
With the recent advent of Next Generation Sequencing technologies (Mardis 2008) which allow for genome-wide studies in non-model organisms, several new questions can now be addressed in speciation research, notably the respective roles of ecological divergence and endogenous barriers in genomic divergence. In this section, I will first recall some of the footprints of selection left across the genome and how to detect them with genome scans, and then discuss some current debates in speciation genomics.
Genetic hitchhiking, hard and soft sweeps
When a single new adaptive mutation occurs with a selective advantage (s), its frequency quickly rises until fixation. Individuals carrying the mutation will be favored by selection generating a selective sweep, the so-called “hard”3 sweep. This allele frequency shift leads to a similar shift in allele frequency of loci in close proximity to the selected locus, a process called genetic hitchhiking (Smith & Haigh 1974). Under a strong selective sweep, a local loss in genetic diversity occurs in the neighborhood of the hitchhiker alleles, so that almost only one haplotype remains. The selective sweep size will be enhanced by selection that will increase the frequency of all alleles in the same neighborhood, but it will be eroded by recombination and this erosion will increase as a function of time (Kim & Stephan 2002). Hitchhiking is supposed to be efficient regardless of effective population size (Gillespie 2000, 2001). In the classical model of local hitchhiking, genetic differentiation decreases as a function of the distance from the hitchhiker locus (Fig 7a,b)(Charlesworth et al. 1997a). This corresponds to the case where a mutation appears favorable in its deme but unfavorable in another deme. This model implies a strong genetic differentiation between populations at the (selected) hitchhiker locus and other loci in its neighboring environment. In a second model, a mutation can be favorable globally, that is to say in two structured populations, and its classical signature implies two peaks of differentiation on each side of the selected locus (Fig7 c, d). This model of global hitchhiking in structured populations implies that the intensity of the sweep will be smoother than in the first model (details in Bierne 2010). Such patterns may lead to false inferences of genome scan data in search of footprints of ecological selection, as will be discussed below.
Insight from studies of parallel adaptation and parallel speciation
A promising way to understand the origin of species and their historical mode of divergence is to study independent replicate pairs of populations. The independent evolution of the same phenotypic trait in independent populations is called parallel evolution and it is suggested that this is “strongly due to the action of natural selection” because genetic drift is unlikely to result in such concerted patterns in different places (Schluter & Nagel 1995; Johannesson 2001). A classical scenario of parallel evolution at the phenotypic level is provided in Figure 8. When a trait that induces RI evolves independently in different populations, Schluter & Nagel (1995) proposed that it shows a case of parallel speciation. In general, the best examples demonstrated a strong role for size assortative mating.
Table of contents :
Chapter 1: General introduction
1. Speciation and Reproductive isolation
2. Modelling gene flow across space and time
3. Studying speciation: selection vs endogenous barriers
4. Insight from studies of parallel adaptation and parallel speciation
5. Lampreys as a model of speciation research
6. Goals of the Thesis
Chapter 2: Investigating gene flow and reproductive isolation in lampreys
Article 1: Low reproductive isolation and highly variable levels of gene flow reveal limited progress towards speciation between European river and brook lampreys
Chapter 3: Investigating divergence history of European river and brook lamprey
Article 2: Reconstructing the demographic history of divergence between European river and brook lampreys using Approximate Bayesian Computations
Chapter 4: Understanding speciation: moving toward genomics
Article 3: Inferring the demographic history underlying parallel genomic divergence among pairs of parasitic and non-parasitic lamprey ecotypes
Chapter 5: Effect of anthropogenic disturbance on population genetic diversity and structure of European brook lamprey
Article 4 Moderate effect of river fragmentation but strong influence of gene flow between ecotypes on the genetic diversity of brook lamprey populations
Chapter 6: Discussion & Perspectives
1. Low levels of reproductive isolation and high viability of F1s at an early developmental stage
2. The importance of the geographical context in studying speciation
3. The complexity of histories of divergence
4. Better characterizing isolated L. planeri populations
Appendix 1: Testing selection at linked sites: effects of BGS
Appendix 2: Development of a hybrid linkage map: mapping endogeneous and exogeneous barrier
Appendix 3: Testing outbreeding and heterosis in isolated L. planeri populations