Deterministic solutions of the haploid model for neutral subfunctionalization

Get Complete Project Material File(s) Now! »

Evolution through WGD and SSD

Di erent mechanisms can result in the duplication of regions of the genome, ranging from an individual gene to the entire genome. In particular, a special attention will be given to whole genome duplication events (WGD), genetic accidents that have been demonstrated to occur more frequently than traditionally expected and are thought to play a critical role during vertebrate evolution.

Mechanisms of gene duplication

The advent of genome sequencing has allowed to reveal the widespread occurrence of gene duplication events. Many genes in every sequenced eukaryotic genome have consider-able sequence similarity and are clearly the products of gene duplication [4{11]. Gene du-8 Evolution by gene duplication plicates can arise in many di erent ways [10,11], including unequal crossing over 1 [13,14], break-induced replication 2 and gene conversion 3 during the repair of broken chromo-somes [18], slippage during recombination 4 [20], horizontal transfer 5 and other trans-positions 6 [24{26], as well as temporary polyploidy 7 including an e ective doubling of the whole genome [29]. Therefore, a gene duplication event can involve genomic regions ranging from a single gene, to large genomic segment and eventually to the entire genome.
Most of the above mechanisms generate duplicated regions ranging from a few base pairs to a large genomic segment, typically arranged in tandem 8. Throughout this thesis, we will refer to them as small scale duplicates (SSD). By contrast, polyploidy can give rise to the duplication of the entire genome, so-called whole genome duplication (WGD). Such a genome duplication can be achieved by two mechanisms, auto- and allo-polyploidy [28]. Autopolyploidy, can occur by incomplete chromosome segregation, cytokinesis defects or fusion of two cells of the same organism during early development, leading to a polyploid embryo. In case of allopolyploidy, two cells from di erent but closely related organisms can fuse and give rise to an organism with whole genome duplication [28].

Frequent occurrence of WGD during evolution

Unlike SSD events, WGD events are evolutionary accidents providing the simultaneous duplication of the entire genome of an organism and their impact on evolution has been controversial for a long time.
A change in ploidy is traditionally expected to be deleterious and an evolutionary dead-end [29{31]. It is often argued that the evolutionary success of polyploids is hampered by the ine ciency of selection when multiple alleles are present at each gene. Indeed, the spread of a favorable allele from a given frequency is slower at higher ploidy levels, because the selective e ects of an allele are partially o -set by the presence of alternate alleles [30]. It was also believed that animals, unlike plants, should not tolerate polyploidy due to their usual mode of sexual reproduction [29, 31, 32], although asexual reproduction (such as parthenogenesis) also exists in animals.
By contrast, in the late 1960s, Susumu Ohno proposed that genome duplications are a signi cant mechanism of evolution even in the animal genomes [1, 3]. The increasing amount of genome sequences data and the state-of-the-art approaches to their analysis have now established polyploidy as a major evolutionary mechanism in all major eukaryotes | from unicellular eukaryotes, fungi, plants to animals (Fig.2.1). Polyploidy is especially common in plants: the common ancestor of all the extant angiosperms has undergone a tetraploidy event [37], and almost all major plant lineages have subsequently undergone multiple polyploidy events. Successive WGDs have also occurred in many animal genomes, as in annelids, atworms, mollusks, insects, amphibians [29]. Most importantly, most ver-tebrates are now known to descend from a single lineage that experienced two consecutive WGDs soon after the divergence from other chordates about 500 MY ago [34{36] (this is the long debated \2R hypothesis » [1, 3]). Similarly, all bony shes, which make up about 90% of extant shes, are now known to derive from a single species that doubled its genome about 300 MY ago (i.e. the \3R hypothesis » [38, 39]).
Although, in the short term, polyploidy leads to a population bottleneck (related to the obligate speciation owing to the di erence in ploidy between pre- and post-WGD individuals) and possible competition with their diploid ancestors, the frequent occurrence of WGD events and the success of polyploid organisms strongly suggest that whole genome duplication is a dynamic process that has contributed to the evolutionary diversi cation in plants and animals [29].

Antagonist retention pattern of SSD and WGD duplicates

Most genes belong to gene families which have undergone consecutive gene duplication events [11]. However, a duplication event is usually followed by the loss of the duplicated genes through non-functionalization (see Sec.2.2). In particular, in the case of the 2R WGD events, the majority of the resulting gene duplicates are subsequently lost. Nevertheless, the analysis of the few duplicated genes retained in the genome discloses an interesting retention pattern related to the mode of duplication.
Recent studies have revealed that SSD and WGD duplicates have been retained during evolution following antagonist patterns (Fig. 2.2). Evidence has accumulated that WGD duplicates have been preferentially retained in speci c functional gene classes associated with higher organismal complexity, such as signaling pathways, transcription networks, and developmental genes (for example, nervous system, morphogenesis) [42{46]. This is the case of the basic set of genes involved in development and signaling that was already present in chordates, but WGD events resulted in the speci c expansion of these gene families in vertebrates, leading to the evolution of the neural crest [36], the vertebrate skeleton [47] and brain structures [36]. By contrast, gene duplicates coming from SSD are strongly biased toward di erent functional categories, such as antigen processing, immune response, and metabolism [42]. SSD and WGD duplicates also di er in their gene expres-sion and protein network properties [48, 49]. Moreover, recent genome-wide analyses have shown that WGD duplicates in the human genome have experienced fewer SSD than genes not coming from WGD events and tend to be refractory to copy number variation (CNV) caused by polymorphism of small segmental duplications in human populations [41]. All these ndings highlight the antagonist retention patterns of WGD and SSD gene duplicates and suggest the relevance of WGD for vertebrate evolution.

Critical role of WGD in evolution

Recent studies on the retention of duplicated genes suggest the critical role of du-plication events (especially WGD) during evolution, as outlined in the previous section 2.1.3. However, the importance of gene duplication in supplying raw genetic material to biological evolution has been recognized since the 1930s [50]. In particular, duplication of genes (and their subsequent functional divergence) can now be considered the major evolu-tionary mechanism to generate new genes and rewire cellular pathways and networks [11]. Without gene duplication, the plasticity of a genome in adapting to changing environments would be severely limited.
In the late 1960s Susumu Ohno outlined the potential role of gene duplication as the driving force behind the evolution of increasingly complex organisms. He suggested that the huge boost in complexity in vertebrates was facilitated by the sudden increase in the availability of genetic material through WGD events, which was subsequently modeled by evolution in the following millions of years [1, 3]. Indeed, while SSD duplicates provide a continuous ux of genetic material, WGD events can favor unique evolutionary innova-tions, implying the simultaneous duplication of many genes at once. However, compelling evidence supporting this hypothesis (the so called \2R hypothesis ») remained elusive for a long time. Only recent genome wide studies have con rmed the occurrence of these WGD events at the origin of vertebrates [34{36] and WGD events have now been rmly established in almost all major eukaryotic lineages [51] (see Sec.2.1.2). Therefore, the two rounds of WGDs in the early vertebrate lineage are now credited with creating the condi-tions for the evolution of vertebrate complexity. Due to the pioneering works of Susumu Ohno, the genes retained from WGD events are now referred to as \ohnologs » [3, 52].

READ The frame of d-ideals of RL

Evolutionary fate of gene duplicates

Gene duplicates arise frequently, either via local or genome-wide events. In particular, genome-wide analyses have estimated the average rate of origin of new gene duplicates to be of the order of 0:01 per gene per million years [7]. However, the majority of duplicated genes appears to be transient and only a minority is retained in the genome [6,11], leading to a still ongoing debate about the evolutionary mechanisms and constraints governing the retention of gene duplicates. Newly duplicated genes (called paralogs) are assumed to initially have fully overlapping redundant functions [6,53,54]. In the absence of any advantage for this redundancy and due to the frequent occurrence of genetic degenerative mutations [55], it is commonly thought that one copy will usually become silenced by the random accumulation of degenerative mutations [3, 56{60]. Degenerative mutations disrupt the structure and the function of the gene such that it gradually becomes a pseudogene, which is either unexpressed or functionless [6, 54]. After a long evolutionary time, pseudogenes will either be deleted from the genome or become so diverged from the parental genes that they are no longer identi able and the traces of duplication are lost [11]. Observations from the genomic databases for several eukaryotic species suggest that the vast majority of gene duplicates are silenced within a few million years [6]. Therefore, non-functionalization, the stochastic silencing of one copy, is considered to be the most likely fate of a duplicated gene (e.g. about 80 90% of WGD duplicates are estimated to be lost from the genome through non-functionalization). However, it is now known that most eukaryotic genomes harbor large numbers of functional gene duplicates, many of which originated tens to hundreds of millions of years ago [61{64]. Di erent evolutionary mechanisms have been proposed to explain the preservation of duplicate genes, and the most credited ones are elucidated in the following sections.

Bu ering against deleterious mutations

The presence of gene duplicates may confer robustness against deleterious mutations, since the duplicates can compensate each other’s function, behaving as a backup mecha-nism [75]. This idea was initially proposed based on experimental studies [76, 77]. How-ever, bu ering alone should only rarely lead to the preservation of a pair of genes, since it requires that they are completely redundant in function [75, 78].

The dosage balance hypothesis

The dosage balance hypothesis has been proposed to explain the distinct properties of SSD and WGD duplicates, whose antagonist retention pattern has recently become apparent (see Sec. 2.1.3).
Evidence from a variety of data suggests that in multicellular eukaryotes the sto-ichiometric relationship of the components of regulatory complexes a ects target gene expression. This mechanism sets the level of gene expression and, as a consequence, the phenotypic characteristics [79, 80]. This concept has been successively extended from reg-ulatory to all protein complexes [81]. Therefore, the relative dosage (i.e. the amount of protein expressed) of genes belonging to the same complex or to the same metabolic path-way plays an important role for the proper formation and functioning of cellular assemblies and must be preserved [45, 75, 79{83] (Fig. 1 in [83]).
Since a WGD event implies a simultaneous duplication of the entire set of genes of an organism, it leads to the initial preservation of dosage balance constraints. As a con-sequence, it was supposed that the complete set of genes whose products participate in protein{protein interactions tend to be retained to prevent the loss of only one gene that would lead to the deleterious e ects of dosage imbalance [81]. By contrast, duplication through SSD of only one of the interacting partners leads to dosage imbalance and has been proposed to be opposed by natural selection [45, 75, 82]. In particular, studying the yeast duplicates, Papp et al. [81] observed that WGD-retained genes are somewhat en-riched in protein complexes and suggested that an imbalance in the components of protein complexes leads to lower tness 9. Since both the loss of a WGD duplicate and the dupli-cation through SSD in protein complexes are thought to be opposed by selection during evolution, the dosage balance hypothesis has been frequently invoked to explain the biased retention of SSD and WGD genes in a variety of organisms such as yeast [81], Parame-cium [84], Arabidopsis [43] and human [41], by seeking enrichment of protein complexes in WGD duplicates.

Table of contents :

List of Figures
List of Tables
Abbreviations & Denitions
I Introduction
1 Preamble
1.1 Thesis summary
1.2 Organization of the thesis
1.3 Publications resulted/forthcoming from this thesis
2 Evolution by gene duplication
2.1 Evolution through WGD and SSD
2.1.1 Mechanisms of gene duplication
2.1.2 Frequent occurrence of WGD during evolution
2.1.3 Antagonist retention pattern of SSD and WGD duplicates
2.1.4 Critical role of WGD in evolution
2.2 Evolutionary fate of gene duplicates
2.2.1 Neo-functionalization
2.2.2 Subfunctionalization
2.2.3 Buering against deleterious mutations
2.2.4 The dosage balance hypothesis
2.3 Dominant deleterious mutations
2.3.1 The great expansion of dominant deleterious gene families
2.4 Objectives
II Materials & Methods
3 Population genetics approach
3.1 Hypothesis: a qualitative model recently proposed
3.2 Population genetics models: a deterministic approach
3.2.1 Simple haploid deterministic model
3.2.2 Extension to diploid models
3.3 Population genetics models: a stochastic approach for small populations
3.3.1 General approach for K alleles
3.3.2 Stochastic simulations
4 Mediation Analysis approach
4.1 Pearl’s Causal Mediation Analysis
4.1.1 Total, direct and indirect eects
4.1.2 An example: the simple binary case
4.2 Application of the Mediation Analysis to genomic data
4.3 Extensions of the Mediation Analysis to more than three variables
4.3.1 First approach: the distinction between mediators and covariates
4.3.2 A general approach: the parents of X and Y
4.4 Relationship to Maathuis’s approach
III Results
5 Population genetics results
5.1 Deterministic solutions of the haploid model for neutral subfunctionalization
5.2 Analysis of gene duplicates xation through stochastic simulations
5.2.1 Fixation rates for neutral subfunctionalization
5.2.2 Finite size eects on the xation of gene duplicates
5.2.3 Extension to adaptive subfunctionalization for SSD duplicates
5.3 Application to the prevalence of human oncogenes with WGD vs SSD duplicates
6 Mediation Analysis results
6.1 The extended Mediation Analysis on genomic data
6.1.1 Genomic properties related to ohnolog retention
6.1.2 Inferred causal graph for ohnolog retention
6.1.3 Application of the extended Mediation Analysis to genomic properties
6.2 Direct causes of ohnolog retention
IV Discussion & Perspectives
7 Discussion and Perspectives
V Appendices
A Population genetics: a general stochastic approach
A.1 General stochastic models using a master equation
A.2 Four-allele deterministic models of SSD vs WGD duplicates retention
A.3 Exact results for two-allele stochastic models
B Pearl’s theory of the do-calculus
B.1 Basic concepts of the do-calculus
B.1.1 Markovian models
B.1.2 General models
B.1.3 The do-calculus and the Mediation Analysis
Bibliography