Tools for studying diversification from reconstructed phylogenies

Get Complete Project Material File(s) Now! »

Incorporating heterogeneity in diversification models.

The discovery that phylogenetic tree shapes generally deviate from those obtained with a birth-death model – as for example in terms of and statistics – fostered the development of an ever-increasing body of more complex diversification models. Those models contribute to get an insight in what is impacting the tempo of diversification, and thus in the processes that shaped today’s patterns. This section is an attempt to review representative examples.

Heterogeneity through time.

A constant rate birth-death model should theoretically generate LTT plots which display an in-crease in lineage accumulation speed toward present time, the so called pull of the present (Harvey et al., 1994; Nee et al., 1994a). Empirical LTT plots, however, more often show the reverse pattern (Kubo and Iwasa, 1995; Zink and Slowinski, 1995; Lovette and Bermingham, 1999). This obser-vation, coupled to the related observation that empirical phylogenies generally display negative statistic values (Phillimore and Price, 2008; McPeek, 2008; Rabosky and Lovette, 2008a; Moen and Morlon, 2014), led to the hypothesis that speciation decreases with time within clades. Additionally, diversification rates may be impacted by changes in the biotic or abiotic environment of a clade. Time dependency. A common way to incorporate this time signature in cladogenesis diver-sification models is to assume a continuous functional dependency of diversification rates to time (Rabosky and Lovette, 2008b; Morlon et al., 2011; Hallinan, 2012). A likelihood expression is avail-able for those models, allowing to fit them to empirical phylogenies. Although the likelihood can theoretically accommodate any functional time dependency shape for the speciation and the extinc-tion rate, they are generally assumed to vary as linear or exponential functions to limit the number of parameters needed.
Alternatively, models with discrete forms of time variation have also been developed (Stadler, 2011; Hallinan, 2012). Methods based on those models oﬀer the advantage of not requiring to specify a predefined form for how diversification rates changed through time, and rely on likelihood ratio tests to define the number of rate shifts that occurred along a clade’s evolutionary history.
Those two approaches allow to assess whether the diversification tempo changed through time – e.g. if there was an increase or a decrease in rates, or a waxing and waning pattern (diversity going up and down) – without making assumptions on the factors that may have driven those changes. Those may be extrinsic factors, linked to change in the abiotic or biotic environment of the clade, or intrinsic to the diversification process, as could be the case if speciation is a protracted process. Environmental dependency. Diversification rate changes in the time dependent models may show what the tendencies are, but they are not giving any clues about what have driven them. Further test are needed to get an insight in the processes that are acting. Some of them may be changes in the abiotic (e.g. temperature, sea level…) or biotic (e.g. predation pressure) environment. If information is known about how those varied through time, it can be incorporated within the birth-death model framework by assuming a functional dependency of diversification rates to the environmental variable (Condamine et al., 2013). As for the time dependent model, a linear or exponential functional dependency is generally assumed. Simulation studies showed that the model is well-behaved and allows recovering the right environmental driver of diversification as long as the dependency is strong enough (Lewitus and Morlon, 2017). The approach has been applied on empirical data, highlighting for example a negative dependence of speciation and extinction rates to temperature in birds (Claramunt and Cracraft, 2015), a positive correlation between sea level and extinction rates in birdwings butterfly (Condamine et al., 2015), and a positive association between speciation rate and temperature in Cetaceans (Lewitus and Morlon, 2017).
Mass extinction. Past mass extinctions leave a signature in today’s biodiversity, showing as a plateau in the LTT plots (Crisp and Cook, 2009). Mass extinction events have been incorporated in the discrete time dependent model (Stadler, 2011), as well as in the continuous one (Höhna, 2015), by setting probability i to survive at time ti. The discrete time version was subsequently integrated within a bayesian framework, using reversible jump MCMC to detect the number and positions of rate shifts and mass extinction events (May et al., 2016).
Diversity dependence. In all the models described in the previous paragraphs, species are assumed to be independent from each other. Yet competitive interaction among species is thought to have major impact of species diversity. The fact that diversification rates are often found to decrease with time, and the fact that, for many groups, no relationship is found between clade age and species richness, is often interpreted as an evidence for diversity-dependent speciation (Ricklefs, 2007a; Phillimore and Price, 2008; McPeek, 2008; Rabosky, 2009a), with diversification becoming less and less likely as ecological niches are being filled. This process was first implicitly incorporated in diversification models through decreasing speciation rates with time (Rabosky and Lovette, 2008b), but models with explicit diversity dependent diversification are now available (Rabosky and Lovette, 2008a; Etienne et al., 2011). Etienne et al. (2011) developed a model in which the speciation rate is a function of the number of species in the clade, as well as an inference procedure to fit it on empirical phylogenies. The dependency of the speciation rate to species number is included as a logistic function, with (n) = max 0; 0 ( 0 ) Kn , where K is the equilibrium species richness, which is reminiscent of the ‘carrying capacity’ used in ecological models.
The negative diversity dependent diversification process is thought to be due to the filling of niche space by competitors that impedes diversification. Occasionally, a lineage may escape competition, because of dispersal to a new area or appearance of a key innovation that allows it to enter a novel adaptive zone. This can be added to the diversity-dependent diversification model framework as a decoupling of the diversity-dependent dynamics of a subclade from the main clade’s diversity-dependent dynamics (Etienne and Haegeman, 2012), allowing to model the whole adaptive radiation diversification, from the first high speciation phase to the slowing in diversification rate phase.
Protracted speciation. Mechanisms other than a dependency to time – or to factors that correlate with it – can lead to a signature of decreasing speciation through time. This is the case if speciation is protracted. In the classical lineage-based models of diversification, speciation is modeled as an instantaneous process. Yet, the mechanisms leading to the formation of distinct species is a complex one that necessitates the completion of several steps, and it may take a non-negligible time to go from initial lineage splitting to the actual completion of speciation (Avise et al., 1998; Gavrilets et al., 2000; Benton and Pearson, 2001; Norris and Hull, 2012). This idea that speciation takes time to complete, referred to as protracted speciation, has been incorporated within the birth-death framework by Etienne and Rosindell (2012). In their model, species give birth to new, incipient species at rate 1 (which is the rate of speciation initiation), which can subsequently become good species at rate 2 (which is the rate of speciation completion), or give birth to another incipient species at rate 3 (Fig. 4a). Each stage has its own extinction rate i. The authors show that protracted speciation provides an explanation for the shape of empirical LTT plots (Fig. 4c). Additionally, taking 1 < 3 generated reconstructed phylogenies with values below 0, as is generally seen in empirical data.
In Etienne and Rosindell (2012), the likelihood of the model was only available for the pure birth case (with 1 = 2 = 0). An approximate likelihood expression was latter derived for the model with non-zero extinction (Lambert et al., 2015), allowing to estimate the mean time needed to complete speciation in the model from empirical data (Etienne et al., 2014).
Etienne and Rosindell (2012) model defines protracted speciation at the lineage level, but it can also arise from the way species are defined from individuals genealogies. In Rosindell et al. (2010), the authors studied the eﬀect of adding a protracted mode of speciation to the neutral theory of biodiversity. In their model, species are formed through a point mutation mode of speciation, but they are considered incipient species until generations occurred, assuming a fixed time for the completion of speciation. They showed that it added realism to the prediction of the neutral theory, especially when it comes to species lifetime, speciation rate and the number of rare species. Other models allow the mutation process to build up until individuals are considered as being part of diﬀerent species, using lower level (individual or population level) genealogy with mutation to define species (Rosindell et al., 2015; Manceau et al., 2015). In the United Theory of Ecology and Macroevolution (UTEM Rosindell et al., 2015), two populations are part of the same species if there are less than n mutations along the genealogical path between them, or if other extant individuals bridge the gap between them (Fig. 5). As soon as n > 1, speciation takes time to happen and happens as a protracted process. Note that species defined this way are very rarely monophyletic, even for n = 1 (Fig. 5). In the Speciation by Genetic Diﬀerenciation model (SDG Manceau et al., 2015), individuals follow a birth-death process, with occasional mutation events arising at a constant rate. Species are defined from individual genealogies with mutations as the smallest monophyletic group of extant individuals such that any two individuals of same genetic type always belong to the same group (Fig. 6). This approach was used to define species in Chapter 3 of this thesis. Although neither the UTEM nor the SDG model incorporate evolutionary rate slowdown in their specification, they both generate phylogenies with more realistic species accumulation curves than the constant rate birth-death model.
Figure 6: Species definition in the Speciation by Genetic Diﬀerentiation model. From Fig 1. in Manceau et al. (2015). Species are defined from the genealogy of extant individuals with mutations (left tree, with red dots denoting mutation events) as the smaller monophyletic group of extant individuals such that any two individuals of same genetic type (indicated with diﬀerent color on the left tree) always belong to the same group. The right tree shows the obtained phylogeny. Even though only one mutation is needed to define a species, the monophyly condition makes the speciation mode to be protracted.

Heterogeneity across lineages.

In the models described in Section 2.1, diversification rates are hypothesized to be homogeneous within the clade. While this seems a reasonable assumption when considering only a limited set of taxa, large scale phylogenies encompass very diﬀerent species, with diﬀerent characteristics and evolutionary histories. There are many reasons for species not to diversify at the same speed. As an example, key innovations may happen at the basis from given clades, giving them an advantage by allowing them to temporally escape predation or competition pressure, or by allowing them to colonize a new environment, thus enhancing their diversification rates. Also, groups of species living in diﬀerent biogeographical areas, or varying in essential traits such as those aﬀecting reproductive isolation – e.g. reproduction mode (Goldberg et al., 2010), or pollination and dispersal syndromes (Onstein et al., 2017) – are likely to diversify at diﬀerent paces.
All diversification models for which diversification rates are constant within the clade generate trees with topologies identical to those generated with the Yule model, with an expected statistic equal to 0 (Lambert and Stadler, 2013). This diﬀers much from topology observed in empirical data (Aldous, 2001; Blum and François, 2006), comforting us in the idea that the diversification process is far from being homogeneous in nature. Being able to position those changes on phylogenies, quantify them and link them to changes in species characteristic is of particular interest to evolutionary biology. Another important, but more technical aspect is the fact that not accounting for the possibility that rates may have varied across a clade’s lineages leads to bias in diversification rates estimates. It especially leads to extinction rates estimates close to 0, opening a debate about whether extinction rates should be estimated from molecular phylogenies only (Rabosky, 2010; Beaulieu and O’Meara, 2015; Rabosky, 2016a). Yet including even a few rate shifts within the phylogeny can bring back extinction rates estimations close to those obtained from the fossil record (Morlon et al., 2011).
Rate shift detection. Two systematic methods have been proposed to position diversification rate shifts on phylogenies, both building on a lineage-based model in which diversification happens as an homogeneous process, with occasional shift events happening on a lineage. Once a shift happens, the clade descending from this lineage diversifies with its own diversification rate (Fig. 7a). The first of those methods, MEDUSA (Alfaro et al., 2009), uses a stepwise AIC (Akaike Information Criterion) procedure to detect how many shifts occurred during the history of a clade as well as their position and magnitude. A simple constant rate birth-death model is first fitted to the tree. Then a model in which there is one shift at a node is fitted for every possible position of the shift. The AIC of these models are computed, and the model with one shift is selected if the diﬀerence in AIC between the constant rate model and the best model with one shift is more than AICcrit (in the first implementation, AICcrit was set to 4, but the more recent version has a AICcrit that depends on the number of tips in the tree). If the one shift model is selected, support for a two shifts model is tested using the same procedure, and so on until no more shifts are selected. A backward elimination procedure is then performed.
BAMM (for Bayesian Analysis of Macroevolutionary Mixtures, Rabosky, 2014, Fig. 7) is an-other, more recent method that has been developed for detecting diversification rate shifts on a reconstructed phylogeny. This approach uses reversible jump Monte Carlo Markov Chain (rjM-CMC). Compared to MEDUSA, it has the advantage of allowing speciation and extinction rates to vary through time between rate shifts, an important feature since a decrease in diversification is commonly observed on empirical phylogenies (McPeek, 2008; Phillimore and Price, 2008; Moen and Morlon, 2014). In addition, the rjMCMC framework allows to simultaneously select the most probable model (the number of shifts) and the model parameters (the position and the amplitude of the shifts), thus allowing to explore (unlike MEDUSA) uncertainty around the number and location of shifts.
These two methods are very similar in the way they envision diversification rate shifts. They both rely on the hypothesis that rate changes are large and uncommon (Fig. 7a), both implement a model selection procedure, and are both based on the same likelihood expression (with additional trend parameters in the case of BAMM). The major diﬀerences are on the way this selection procedure is performed (stepwise AIC for MEDUSA, rjMCMC for BAMM), and the possibility for BAMM to account for time variable rates. The ability of both method to accurately recover the number of shifts and their position on the phylogenies has recently been put into question. In MEDUSA, it come from the fact that the use of the AIC to determine the number of shifts is somewhat arbitrary. In BAMM, a problem is that the number of inferred shifts is higly dependent on the prior used (Moore et al. 2016; but see (Rabosky, 2017)). Additionally, the likelihood used for the inference of both methods use the assumption that no diversification shift occurred within unobserved lineages, and this recently fueled a controversy about their statistical performances (May and Moore, 2016; Moore et al., 2016; Rabosky et al., 2017; Rabosky, 2017; Mitchell and Rabosky, 2017; Meyer and Wiens, 2018).
In Chapter 2 of this PhD thesis, I propose an alternative method, built from a model in which diversification rates vary in a more gradual way across the phylogeny.
Character dependent diversification. The above described methods give an insight on how and where diversification rates varied in the history of a clade, but without making any assumption on why they would actually vary. One question of particular interest to biologists is whether they depend on species characters, and on which of them (Jablonski, 1987; Slowinski and Guyer, 1993; Barraclough et al., 1998). This has traditionally been answered through sister clade analyses, which consist in comparing the species diversity in to clades descending from a single common ancestor, in which species have diﬀerent character states (Mitter et al., 1988; Barraclough et al., 1998; Barra-clough, 1998). Yet this approach has several caveats, in that it cannot distinguish between increased speciation or decreased extinction – being based on the principle of the method of moments, cf sub-section 1.3 of the present introduction –, nor diﬀerentiate between increased diversification for one of the character states or asymmetrical transition rates (i.e. when the transition from one character state to the other is more likely in one direction than in the other one Maddison, 2006), and prevent using clades with mixed character states for the analysis.
To counter those caveats, a likelihood based approach, the binary-state speciation and extinction model (BiSSE, Fig. 8), has been proposed in Maddison et al. (2007). In their model, lineages are either in state 0 or 1 of the parameter of interest. The possible events for a lineage in state i are speciation (with rate i ), extinction (with rate i), or transition to state i 6= j with rate qij. They derive the likelihood of a phylogeny with observed character state at tips under this model and use it to test whether the character had an impact on diversification (by model selection against a birth-death process) and estimate the six model parameters.
This model has then been subject to many extensions, all known as the SSE methods (for state speciation and extinction). They now allow to include the possibility to account for unsampled lineages or unknown present character state (FitzJohn et al., 2009), multiple states characters or interaction between several characters (FitzJohn, 2012, MuSSE), quantitative traits (FitzJohn, 2010, QuaSSE), geographic characters (Goldberg et al., 2011, GeoSSE), cladogenetic character evolution (the changes in character state happen in conjunction with a speciation event; Goldberg and Igić, 2012, ClaSSE). (2011). States and transitions in the BiSSE model. (b): A tree generated with the BiSSE model, with the R-package diversitree (FitzJohn, 2012). Grey is state 0, yellow is state 1. The parameter used are 0 = 0:1, 1 = 0:4, 0 = 1 = 0, q01 = 0:05, q10 = 0:2. (c): Marginal distributions of 0 (grey) and 1 (yellow) estimated by MCMC for the tree in (b). The vertical lines show the parameter values used to simulate the tree.
It has recently been shown that these approaches suﬀer from a high Type I error rate – traits that have evolved neutrally along the branches of a phylogeny are very often shown to have had an impact on diversification (Rabosky and Goldberg, 2015). This likely comes from the fact that the null model against which the state dependent model is compared, a birth-death model, is too simplistic. Empirical phylogenies generally display evidence for diversification rate variation – as shown by the statistic – and that leads to rejection of the birth-death model by the method, rather than to acceptation of the character eﬀect. New approaches have been proposed to correct for this bias. The HiSSE model (Hidden State Speciation and Extinction; Beaulieu and O’Meara, 2016) is an extension of BiSSE that adds the possibility that a hidden character – whose state at present is unknown – also had an eﬀect on diversification. Another method uses lineage specific diversification rates estimate (obtained from BAMM, Rabosky, 2014) and a permutation procedure on the trait present states to assess whether independence between diversification rates and trait values can be rejected (Rabosky and Huang, 2015). Both those methods allow to get reasonable Type I error rates. Yet BAMM provides subclade specific rather than lineage specific rate estimates. The approach we propose in Chapter 2 of this thesis oﬀers a way to get to lineage specific estimates, which could be useful to use in this context.
Age dependent diversification rates. Another reason for why diversification rates could diﬀer between lineages at a given point in time is the possibility to have age-dependent speciation or extinction rates. In those models, speciation (or extinction) rates are assumed to change with the age a species (Venditti et al., 2010; Hagen et al., 2015; Alexander et al., 2015). This may arise for example from speciation being the result of many small events (Venditti et al., 2010), or as the result of varying ecological pressure or population size throughout a species age (Hagen et al., 2015; Alexander et al., 2015), resulting in non-exponentially distributed branch lengths. Those processes can either be symmetrical – both daughter species have their age reset to 0 at a speciation event – or asymmetrical – one species inherits the age of the ancestor, the other daughter species age is reset to 0. Age-dependent speciation can lead to tree shapes that are more imbalanced than for trees generated with a Yule model, but this seems not to be the case for age-dependent extinction (Hagen et al., 2015).

READ The Kesterite CZTS

Eﬀects of species interspecific interactions on diversification.

Biotic drivers of species diversity.

The red queen hypothesis. The court jester hypothesis postulates that extinction, speciation and evolution happen in response to random and unpredictable environment changes (Barnosky, 2001). The competing interpretation, the red queen hypothesis, postulates that biotic interactions are suﬃcent for evolutionary changes to happen, even in the absence of abiotic environmental changes, because of the ever changing biotic environment that requires constant adaptation (Van Valen, 1973). According to this hypothesis, an evolutionary change that gives an advantage to one species gives a disadvantage to others, which result in a continuous evolutionary race between species in the community. Although it can reasonably be assumed that both biotic and abiotic factors have a part to play, the relative weight of biotic and abiotic factors on species evolution and diversification is still a much debated issue (Barnosky, 2001; Benton, 2009; Voje et al., 2015). One common prevalent view is that, even though biotic forces act at small spatial and temporal scales, macroevolutionary processes are dominated by abiotic transitions (Barnosky, 2001; Benton, 2009). The zero-sum assumption of the red queen hypothesis – that the gain in fitness of one species should be balanced by losses of fitness of others – has been heavily criticized (Maynard Smith, 1976; Stenseth and Maynard Smith, 1984). But other evolutionary models, not relying on this assumption, allow for continuous evolution and diversification even in the absence of abiotic changes (reviewed in Voje et al., 2015). Models of food web evolution show that antagonistic interactions may lead to unceasing trait evolution and species turnover, with occasional collapse in species number (Loeuille and Loreau, 2005; Guill and Drossel, 2008; Takahashi et al., 2013; Allhoﬀ et al., 2015).
Coevolutionary diversification. Since Ehrlich and Raven (1964) study on the diversification of butterflies and plants, the idea that species coevolution can lead to codiversification has received great interest. In their verbal model, they hypothesized that antagonistic interactions between the plants and the butterfly larvae that feed on them could promote species diversity through an escape and radiate mechanism, with new defenses strategies appearing in the resource guild, allowing them to escape their enemies and diversify quickly as long as those did not develop another key innovation that enable them to colonize the new resource clade and radiate in turn.
Subsequent work on coevolutionary diversification has shown stronger evidence for diversification driven by antagonistic interactions than mutualistic ones, in both theoretical and empirical studies (for a review of the eﬀect of diﬀerent interaction types on diversification, see Hembry et al., 2014). Several theoretical models showed that antagonistic interaction can lead to increased trait diversity among populations (Gandon, 2002; Yoder and Nuismer, 2010). Antagonism driven speciation has been demonstrated in several empirical systems. In North American milkweeds for instance, invest-ment in defense traits resulted in higher diversification rates (Agrawal et al., 2009). In butterflies, major host shifts resulted in bursts of diversification, in concordance with the escape and radiate hypothesis (Ehrlich and Raven, 1964; Fordyce, 2010).
In mutualistic communities, modeling works usually predict a reduction of trait diversity (Kopp and Gavrilets, 2006; Yoder and Nuismer, 2010). Empirical evidences for increased species diversity through mutualistic interactions are limited and mainly restricted to two kinds of interactions (Hembry et al., 2014). One of them is resource symbiose, that can facilitate the invasion of new adaptive zones. In gall inducing midges for example, those that have evolved symbiotic associations with fungi are able to use a broader range of host-plant taxa, promoting their diversification by making host-shifts more probable (Joy, 2013). Another case of mutualism inducing diversification is that of pollinator-plant interaction, that acts on reproductive isolation (Ramsey et al., 2003; Kay, 2006; Cruaud et al., 2012). But in Yucca, no association between specialized pollination and elevated diversification rate was found (Smith et al., 2008).
Testing the eﬀect of interactions on the diversification process. While the environment-dependent diversification models (see Section 2.1) oﬀer a tool to test for the eﬀect of abiotic factors on diversification, only few methods are available to test for the eﬀect of biotic interactions on an empirical phylogeny, probably because of the diﬃculty to account for non-independence between lineages in the birth-death framework. To my knowledge, the only method allowing to account for lineage non-independence within a clade in this framework is the diversity-dependent model (Rabosky and Lovette, 2008a; Etienne et al., 2011, see also Section 2.1). In Etienne et al. (2011), the approach was applied to five empirical phylogenies, that all favored the diversity-dependent model over a constant rate birth-death model. This suggests that negative diversity dependence is common in empirical data. Results obtained with this method have yet to be taken with care, as parameter estimates tend to be strongly biased, and model selection against a diversity independent birth-death model is subject to a high Type I error (Etienne et al., 2016).
To test for inter-clades interactions eﬀects on diversification, one possibility is to use the en-vironmental dependent model and use the diversity-through-time of one group as an explanatory variable for the diversification of the second group (Lewitus and Morlon, 2017). The method re-vealed a positive eﬀect of ostracod diversity on the diversification of Cetaceans, possibly due to their role as a food source. Yet this approach does not allow to test for both clade impacting the diversification of the other, as would be expected for example from the escape and radiate scenario.
If we are interested in diversification in trait space, a few other tools are available. The general comparative phylogenetic methods framework can be used to incorporate diversity-dependent trait evolution (Mahler et al., 2010; Weir and Mursleen, 2013), or to test for the reciprocal eﬀect of species similarity in trait value, either within a clade or between the species of two diﬀerent inter-acting clades (Drury et al., 2016; Manceau et al., 2016). Applied to the Greater Antillean Anolis lizards radiation enabled Drury et al. (2016) to highlight the impact of species competition on trait evolution. Applying this type of method to two interacting clades require to have an idea about what species interacted with whom during the clades history, and thus to have knowledge about how ecological communities assemble and evolve. In the following, I focus on the description of ecological communities structure through the use of ecological networks.

Table of contents :

Introduction
1 Tools for studying diversification from reconstructed phylogenies
1.1 Phylogenies of extant species
1.2 Phylogenetic tree shape
1.3 Estimating diversification rates from phylogenies
1.4 Models of cladogenesis below the species level
2 Incorporating heterogeneity in diversification models
2.1 Heterogeneity through time
2.2 Heterogeneity across lineages
3 Effects of species interspecific interactions on diversification
3.1 Biotic drivers of species diversity
3.2 Bipartite interactions
4 Thesis outline
Chapter 1 : A new index for the clade age-richness relationship
Chapter 2 : Quantifying diversification rate heterogeneity in empirical phylogenies
Chapter 3 : On bipartite ecological interactions and their impact on species richness
General discussion
1 Various modeling approaches to study biodiversity evolution
2 Limitations and perspectives
2.1 Goodness of fit
2.2 Incomplete sampling
2.3 Molecular phylogenies as empirical data
2.4 From patterns to processes
2.5 Extensions of the bipartite speciation model
2.6 Extensions of the heterogeneous speciation model
2.7 A few concluding words.