Horizontal gene transfers and its role in speciation
The notion of species in prokaryotes is intensely debated (Achtman and Wagner, 2008). This is intimately linked to the current inability of evolutionary microbiologists to find a unifying model of prokaryotic cladogenesis. The classical ’Biological Species Concept’ (BSC) (Mayr, 1942), that was originally defined for animals, places the sexual isolation of clades as the central condition for their divergence. In this model, sexual isolation can mostly arise through appearence of pre-zygotic barriers or geographical separation of lineages. Prokaryotes usually co-occur in the environment, most of times with no geographical structure of populations, and they are clonally reproducing but subject to rampant trans-specific HGT. In this context, it is impossible to understand the emergence of prokaryotic species in the framework of the BSC.
Ecological speciation appears as an alternative, and was proposed to be the major way through which prokaryotes diversify (Cohan, 2002b). Paradoxically, HGT can be the cause of ecological speciation. Acquisition of a gene bearing a new function can lead to the emergence of an ecoytpe, a variant with a specific ecological adaptation allowing the exploitation of an ecological niche diﬀerent from that of the parental population (Cohan, 2002a) (Fig. 1.2 A). Such ecological isolation induces the escape of the ecotype from genotypic homogeneization with its relatives by homologous recombination. In addition, the ecotype is subject to strong periodic selection for the ecological trait that defines it. This leads to the frequent purge of diversity within the ecotype and thus accelerates its divergence towards a better adaptation to its new niche (Cohan and Koeppel, 2008) (Fig. 1.2 A,B). As the ecotype diverges independently, the potential to recombine with the parent population decreases (Roberts and Cohan, 1993; Majewski et al., 2000) and eventually, when a threshold of recombination over divergence is crossed, the ecotype lineage is no more able to re-hybridize with its relatives, causing the (A) Mutation and recombination events that determine ecotype diversity in bacteria. Circles represent diﬀerent genotypes, and asterisks represent adaptive mutations. (top) Periodic se-lection mutations. These improve the fitness of an individual such that the mutant and its descendants out-compete all other cells within the ecological niche (ecotype); these mutations do not aﬀect the diversity within other ecotypes because ecological diﬀerences prevent direct competition. Periodic selection leads to the distinctness of ecotypes by purging the divergence within but not between ecotypes. (bottom) Ecotype-formation mutations. Here a mutation or recombination event allows the cell to occupy a new ecological niche, founding a new ecotype. The ecotype-formation mutant, as well as its descendants, can now escape periodic selection events from its former ecotype. (B) Schematic relationship between history of ecologically dis-tinct populations and diversity DNA sequence clusters. Ecotypes are represented by diﬀerent colors; periodic selection events are indicated by asterisks; extinct lineages are represented by dashed lines; clades that may be perceived as sequence clusters are marked by a horizontal black line at the top of the phylogeny.
Reconstructing genomes histories to reveal past and present ecological adap-tations
Ecological speciation models consider that periodic selection is going on over the history of a lineage, maintaining the niche-specifying genes originally gained by ecotypes in their descent (Cohan, 2002b; Retchless and Lawrence, 2007). Indeed, genes conserved in genomes since their acquisition by an ecotype ancestor must have been under long-term purifying selection for their ecological function.
Actually, in studies characterizing species pan- and core genomes, many core genes have still no functional annotation, and are thus unlikely to participate to the rather well described bacterial central metabolism. Instead, they may participate in the stable adaptation of the species in its natural environment. Eﬀorts must be made to phenotypically characterize those core genes and better understand their role in a species’ biology.
The conservation of genes within all members of a clade constitute a first good indication of purifying selection for their ecological role. Additional observation may support or invalidate the selection hypothesis, notably by the comparison of the processes that introduced genes in genomes to a neutral model of genome evolution. Such model can be empirically derived from the global patterns of gene gain and loss along the history of bacterial genomes.
The history of genomes and the processes that shaped their evolution can be reconstructed. A rigorous approach is to explicitly take into account the history of genes in the reconstruction of the history of genomes. This can be done by reconciling the trees of genes with that of species.
Reconstruction of ancestral genomes and reconciliation of gene and species histories
Molecular phylogeny was originally designed to tell the history of macromolecules (Zuck-erkandl and Pauling, 1965) but rapidly gene or protein sequences were used as markers of the evolution of their host species, replacing morphological and biochemical characters. A convenient marker was found in the gene coding the RNA component of the small sub-unit of ribosomes (16S rRNA), as it was ubiquitous and well conserved among known species and thus appropriate to compare distant species. The first phylogeny of the three domains of living organisms was hence derived from the analysis of their 16S rRNA sequences (Woese et al., 1985). Other genes in genomes were sequenced and their molecular phylogeny revealed an almost systematic discord among their history. Errors of reconstruction due to saturated sig-nals and to the limited explanatory power of models of molecular evolution could account for a part of these systematic incongruences but soon enough, it became evident that the history of genes were genuinely diﬀerent (Brown and Doolittle, 1997).
To explain these diﬀerences, it was necessary to invoke events of gene duplication followed by diﬀerential losses or, most often in gene families sampling prokariotes, events of horizontal gene transfer. However, the question arose of what phylogeny should be considered as the reference of species phylogeny, to which could be compared others to infer potential events of HGT, as even the 16S rRNA could be subject to horizontal transfer (Yap et al., 1999).
With the advent of the genome sequencing era, it became apparent that most genes in genomes disagreed on their history, but paradoxically, it was shown that combining the infor-mation from all genes could recover a common vertical signal of descent (Wolf et al., 2001). From this vertical reference could be compared individual gene tree toplogies to model the combination of duplication, transfer and loss (DTL) events that marked the history of genes in genomes. Performing this individual reconstruction of DTL scenarios for all genes and inte-grating them at the genome scale allowed to reconstruct ancestral genomes (Snel et al., 2002). However, the models used for the reconstruction of these scenario, and the way they integrate the informations from gene histories has a great impact on the outcome of ancestral genomes.
Concepts of ancestral genome reconstruction and state of the art
Phylogenetic profile mapping methods Several method have been developed through years, and a popular approach uses the mapping of profiles of presence/absence of genes in extant genomes (i.e. phylogenetic profiles) on a phylogeny of species to propagate the presence/ab-sence states to ancestral genomes. This can be done under diﬀerent models of gene evolution, the simpler being a birth and death process corresponding to events of gene gain and loss along lineages. These models can be subdivided in classes depending on the way best solution of reconstruction are found: using a parsimony approaches (Mirkin et al., 2003; Makarova and Koonin, 2005; Boussau et al., 2004), that are computationally eﬃcient, or in a probabilistic frame-work (Pagel, 1997; Yang et al., 2012; Viklund et al., 2012), that is more satisfying theoretically but also more computationally demanding.
Other methods distinguish the process by which a gene can be gained in a lineage either by duplication, by horizontal transfer or by apparent origination – which can reflect true gene genesis or HGT from an unsampled lineage. Again, these models are available as either using parsimonious (Snel et al., 2002; Cs urös,˝ 2008) or probabilistic (Csurös˝ and Miklós, 2009) criteria to find the best solution. These methods based on phylogenetic profiles are eﬃcient in recog-nizing variation of gene content, but fail in detecting events of gene replacement by HGT. In addition, most are only indicated to deal with profiles of distribution of clusters of orthologous genes (Mirkin et al., 2003), which poses problem of the definition of orthology (Kristensen et al., 2011), and in addition loses the information about the origin of gene lineages. When they deal with multi-copy homologous gene families (Cs urös,˝ 2008; Csurös˝ and Miklós, 2009), mapping methods lose sensitivity when working on more than a few paralogs, because the parallel gains and losses in paralogous lineages can be missed if they do not change the apparent number of homologs in genomes.
Reconciliation methods A more rigorous way to describe the history of gene families is to consider their phylogenetic trees. Reconciliation of gene and species histories consists in mapping evolutionary events on both gene and species tree, in a way that make their respective histories concordant (Fig. 1.4). Diﬀerent kinds of reconciliation methods exist, notably diﬀering in the nature of event they try to infer to explain the incongruences between gene and species tree topologies.
Agrobacterium tumefaciens, a model organism for the study of bacterial cladogenesis and the quest for ecological adaptations
Agrobacterium spp. is a genus belonging to the Rhizobiaceae family of the Alpha-Proteobacteria class, which is primarily known for its pathogeny on plants (mostly woody dicotyledones). Indeed, agrobacteria can carry a plasmid that is able to transfer a segment of its DNA to a host plant genome. This transferred DNA (T-DNA) bears on one hand oncogenic genes, which expression in plant induce tumors in the form of crown galls (for tumor-inducing, Ti plasmids) or hairy roots (for root-inducing, Ri plasmids), and on the other hand genes for biosynthesis of opines (Otten et al., 2008). These molecules are condensates of amino-acids and sugars or keto-carbohydrates which are released by the diseased plant and then used by agrobacteria as carbon and nitrogen source (Moore et al., 1997). This ability to transfer DNA made agrobacteria a tool of choice in biotechnology to build genetically modified plants.
This pathogenic status originally defined the Agrobacterium genus, which gathered a het-erogeneous set of organisms. They could be classified into three groups based on biochemical properties: biovar 1 strains were able to produce 3-keto-sugars, and grouped most strains of named species A. tumefaciens and A. radiobacter, while biovar 2 strains could not produce such sugars, and grouped most strains A. rhizogenes with a root-inducing phenotype (Keane et al., 1970; Kersters et al., 1973). In addition, a third group (biovar 3) was defined that gathered strains inducing crown galls on grapevine and having a preferential use of L-tartaric acid, which were afterwards called A. vitis (Ophel and Kerr, 1990).
However, under the light of classification techniques using molecular markers, the Agrobac-terium genus appeared polyphyletic (Young et al., 2001), and a debate opened on whether Agrobacterium should be integrated to the related genus Rhizobium (Young et al., 2001, 2003) or let as an independent taxonomic unit (Farrand et al., 2003). This polyphyly was resolved by transferring biovar 2 strains to the genus Rhizobium, forming the new species R. rhizogenes (Lindström and Young, 2011). This question is one of those still in debate in the field of taxo-nomic classification of Rhizobiaceae (Lindstrom and Martinez-Romero, 2002), and highlights the diﬃculty of diﬀerentiating closely related groups of bacteria on phenotypical characters that would justify assignation of a latin binomial species name (Stackebrandt et al., 2002). In particular, the original classification of agrobacteria regarding their pathogenic character – whereas this trait is coded by genes borne by accessory plasmids – certainly accounts for the current confusion on the matter.
Indeed, Agrobacteria were for long considered to be only plant pathogens, which ecology would be limited to crown gall and root tumors. However, they can be isolated from soils and from rhizospheres of healthy plants (Mougel et al., 2001) and the Ti plasmid – the causative agent of crown gall (Watson et al., 1975) – is absent from the majority of soil isolates (Bouzar et al., 1993). As a matter of fact, agrobacteria are primarily plant commensals, as they are usually found at much greater density in rhizospheres than soils (Mougel, 2000), and some were even shown to be able to promote plant growth (Hao et al., 2012b). In fact, there was a amalgamation between the true plant pathogens (the Ti/Ri plasmids) and their vectors (agrobacteria).
It has been shown that pathogenic agrobacteria could persist in soils outside of pathogenic outbreaks for long times (Krimi et al., 2002). To understand how these reservoirs of disease can be maintained, it is crucial to know the primary ecology of agrobacteria. Notably, determining the fitness of agrobacteria in their primary niche relative to that in their pathogenic (secondary) niche would help to understand the advantage of bearing Ti/Ri plasmids and how they fluctuate in populations of agrobacteria. Moreover, the association of the pTi/pRi with their agrobacterial hosts could involve complex interactions of genotypes, with certain plasmid types being more adapted to certain host genomes (Bouzar et al., 1993). Defining which agrobacterial population is susceptible of bearing a certain (plant-specific) type of plasmid would help to prevent epidemics, notably by diagnostic of resident agrobacterial populations prior to bed plants in nurseries, but also to design biocontrol strategies.
While some agrobacteria show marked association with host plants (at least regarding in-fection), as for instance A. larrymoorei on Ficus spp. (Bouzar and Jones, 2001) and A. vitis on grapevine (Ophel and Kerr, 1990), the association of A. tumefaciens with a particular niche is not documented. While the majority of known diversity consist of isolates that were recovered from crown gall tumors – and thus are not informative about the primary niche of the taxon – systematic sampling eﬀorts revealed the occurrence of A. tumefaciens in soils, ditch waters, and most importantly in (healthy) plant rhizopheres (Mougel et al., 2001; Portier et al., 2006; Shams et al., 2013), but without clear association to a plant taxon.
Figure 2.1: (Fig. 1 from from Shams et al. (2012), omitted here for copyright reasons) Maximum-likelihood phylogeny of the recA gene of type-strains of all bona-fide genomic species of Agrobac-terium spp. known to date and related Rhizobiales using the revised nomenclature proposed by Costechareyre et al. (2010). Only significant support (SH-like) values (> 0.95) are given. The branch length unit is the number of substitutions per nucleotidic site. B. Bradyrhizobium, Rh. Rhodopseudomonas, Az. Azorhizobium, M. Mesorhizobium, E. Ensifer, R. Rhizobium, A. Agrobac-terium.
In fact, A.tumefaciens, corresponds to several species based on the study of their genomic diversity (i.e. genomic species). So far, eleven genomic species have been described, that are called genomovar G1 to G9 (Ley et al., 1973; Popoﬀet al., 1984) and G13 (Portier et al., 2006), and more recently Rhizobium nepotum (Puławska et al., 2012). A.tumefaciens thus forms a complex of genomic species, that are all closely related but distinguishable based on whole-genome analyses (Portier et al., 2006) and molecular marker phylogenies (Costechareyre et al., 2010; Shams et al., 2013). This genomic diversity is likely to reflect divergent ecological adaptations, especially because diﬀerent species can be found co-occurring in the same micro-sample of soil (Vogel et al., 2003): under the competitive exclusion principle (Hardin, 1960), related species must share diﬀerent niches to avoid competition that would result in the extinction of the less fit species. Though, it is not evident for the moment what could be the nature of this ecological diﬀerences. Rather than marked associations with a host, genomic species of A. tumefaciens seem to show preferential association with some plant rhizospheres (Mougel, 2000). The host plant may « softly » select for some species, thus biasing the species diversity found in their rhizospheres compared to neighbour soils or other rhizospheres, but not to the point of restricting the species inventory. Such quantitative diﬀerence in the diversity of agrobacterial populations associated to plants may stem from subtle diﬀerences in the composition of plant root exudates. In addition, other environments can provide secondary niches to A. tumefaciens, that might participate in their ecological isolation. Indeed, it appeared in the past decades that A. tumefaciens could be responsible of nosocomial infections, notably on immuno-depressed human patients, such as HIV-positive and cystic fibrosis patients and sometimes following the introduction of a catheter in the bloodstream. A survey of the diversity of strains causing such infections again showed no strict association to a genomic species, but a prevalence of genomovar G2 (Aujoulat et al., 2011).
Table of contents :
1.1 Bibliographical review
1.1.1 Gene repertoires and the organization of prokaryotic genomes
1.1.2 The dynamics of the pangenome
1.1.3 The roots of bacterial pangenomes
1.1.4 Functional role of the pangenome
1.1.5 Horizontal gene transfers and its role in speciation
1.1.6 Reconstructing genomes histories to reveal past and present ecological adaptations
1.2 Reconstruction of ancestral genomes and reconciliation of gene and species histories
1.2.1 Brief history of phylogenetics applied to genes and species
1.2.2 Concepts of ancestral genome reconstruction and state of the art
Phylogenetic profile mapping methods
2 Evolution of gene repertoires in genomes of A. tumefaciens reveal the role of ecological adaptation in bacterial cladogenesis
2.1 Agrobacterium tumefaciens, a model organism for the study of bacterial cladogenesis and the quest for ecological adaptations
2.2 Preamble to the comparative genomic studies
2.3 Probing the ubiquity of genes in A. tumefaciens genomes to characterize speciesspecific genes: insights into the specific ecology of A. fabrum
2.4 Reconstructing ancestral genomes of A. tumefaciens reveals ecological adaptations along their diversification
220.127.116.11 Genomic sequence dataset
18.104.22.168 Species phylogeny
22.214.171.124 Reconstruction of ancestral genomes
Reconciliation of genome and gene tree histories
Regional amalgamation of gene histories refines the precision of reconciliations
126.96.36.199 Agrogenom database
188.8.131.52 Genome histories reveal selective pressures that shaped gene
Dynamics of gains and losses in ancestral genomes
Patterns of gene transfer: prevalence within species and among rhizobia
Evaluation of the selection pressures acting on transferred blocks of genes
184.108.40.206 Homologous recombination maintains cohesion of species
220.127.116.11 Effect of the particular architecture of A. tumefaciens genomes on gene evolution
The linear chromid is more recombinogenic than the circular chromosome
Migration of genes between replicons accross A. tumefaciens history
18.104.22.168 Clade-specific genes: insights into the possible ecological speciation of clade ancestors
Genomic synapomorphies of genomovar G1
Genomic synapomorphies of genomovar G8 and [G6-G8] clade
Genomic synapomorphies of [G5-G13] clade
Genomic synapomorphies of [G1-G5-G13] clade
Genomic synapomorphies of the A. tumefaciens complex
22.214.171.124 Precise reconciliations using regional signal in genomes .
Local reconciliation of histories of orthologs in multicopy gene trees
Reconciliation of gene blocks provide more accurate scenarios
126.96.36.199 A history of Rhizobiales, from the point-of-view of the entire genome
188.8.131.52 Role of recombination in species cohesion
184.108.40.206 Ancestral genome content and evolutionary dynamics of genes .
220.127.116.11 Clusters of clade-specific are under purifying selection for their collective function
18.104.22.168 Clade-specific genes in the light of speciation models
22.214.171.124 Ecological adaptations in A. tumefaciens: a history of variation of shared traits
Hybridization of genomovar G1 and G8: ecological convergence or diversification?
Guilds of A. tumefaciens species co-exist by partitioning common resources
126.96.36.199 Secondary (and third) replicon of A. tumefaciens genomes are the place of genomic innovation
The linear chromid is highly plastic and recombinogenic but stabilizes adaptive genes
Anunforeseen role for pAt plasmids as host of clade-specific adaptations
2.4.5 Material and Methods
188.8.131.52 Genome sequencing and assembly
184.108.40.206 Construction of Phylogenomic Database
220.127.116.11 Reference species tree
18.104.22.168 Reconciliation of gene tree with the species tree
22.214.171.124 Block events reconstruction
126.96.36.199 Definition of clade-specific genes from phylogenetic profiles
188.8.131.52 Ancestral location of genes on replicons
184.108.40.206 Tree Pattern Matching
220.127.116.11 Detection of recombination in core genes
18.104.22.168 Functional homogeneity of gene blocks
2.5 Supplementary Figures
2.6 Supplementary Tables
2.7 Supplementary Material
2.7.1 Comparison of several hypotheses for the core-genome reference phylogeny
2.7.2 Gene tree reconciliations: detailled procedure
2.7.3 Block event reconstruction: algorithms
2.7.4 Of the complexity of interpreting ’highways’ of genes transfers
2.7.5 Clade-specific genes: insights into the ecological properties of clades
22.214.171.124 Genomic synapomorphies of genomovar G1
Chemotaxis and phenolic/aromatic compound degradation pathways
G1+G8: Exopolysaccharide biosynthesis
G1+G9: Extra-cellular secretion
126.96.36.199 Genomic synapomorphies of genomovar G8 and [G6-G8] clade .
188.8.131.52 Genomic synapomorphies of [G1-G5-G13]
184.108.40.206 Genomic synapomorphies of the A. tumefaciens complex
Cell wall and outer membrane
2.7.6 Selected cases of large transfer events
2.7.7 Bioinformatic scripts, modules and libraries
2.8 Comparative genomics of A. tumefaciens: Synthesis and perspecitves
2.8.1 Comparison of outcomes from two different studies of the pangenome of A. tumefaciens
2.8.2 A more complete model for the diversification of A. tumefaciens
3 GC-biased gene conversion shapes the bacterial genomic landscape
3.1 GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands
3.1.1 Supplementary Figures
3.1.2 Supplementary Tables
3.1.3 Supplementary Material
Recombination detection methods
3.2 Heterogeneity of genome GC-content and gene population sizes
3.2.1 Hypothesis: large gene population size enhances gBGC
3.2.2 A complex interplay of mutation, selection and recombination
3.3 Validation of results in A. tumefaciens
4 Final Discussion & Perspectives
5.1 Ecophysiology of the arsenite-oxidizing bacterium Rhizobium sp. NT-26
5.2 Acquisition of protelomerase and linearization of secondary chromosome led to the emmergence of a major clade within Rhizobiaceae