Differential gene expression between homoeologs does not result in global sub-genome dominance

Get Complete Project Material File(s) Now! »

Ploidy level and recombination frequencies

Interestingly, several studies pointed out that recombination frequency tends to increase as a consequence of a higher ploidy level. For example, comparative genetic mapping studies in allotetraploid cotton revealed that the At and Dt subgenomes experienced more than 50% higher recombination rates than their diploid counterparts (Desai et al., 2006). Likewise, in Brassica, almost all linkage groups of the A subgenome appeared to be longer in the allotetraploid B.napus than in the diploid B. rapa (Suwabe et al., 2008). More recently, Pecinka et al., 2011 confirmed that recombination frequencies increases in newly formed polyploids, whether they are autotetraploids (A.thaliana x A.thaliana) or allotetraploids (A.thaliana x A. arenosa), compared to diploid A. thaliana, all plants sharing an identical genetic background. This last result indicates that CO increase may occur irrespective of the nature (homologs / homoeologs) of the additional set of chromosomes. The link between ploidy level and recombination frequencies has been more explicitly studied in Leflon et al., 2010 where the authors compared recombination frequencies between allotetraploid (AACC), triploid (AAC) or diploid (AA) Brassica hybrids sharing the same genetic background. They observed an increase in recombination frequencies in allotetraploid compared to diploid but also an unexpected boost in CO frequencies in allotriploid hybrids. I have addressed this issue with more details in paragraph 2.4 (see below p56.).
However, such cases should not be used to conclude that meiotic recombination is always highest in allotriploids. On the contrary, White and Jenkins (1988) and Jenkins and White (1988) observed that chiasma frequency was higher in Scilla autumnalis allotetraploid hybrids than in the corresponding allotriploid hybrid, which indicates that the observed increase is lineage specific.

Impact of recombination on genetic diversity

So far I have presented the mechanistic aspects of meiotic recombination, emphasising on CO formation and control. I will now review how the direct or indirect consequences of meiotic recombination can shape genome diversity and are thought to have major impact on plant genome evolution (Gaut et al., 2007).
In numerous plant species, genetic diversity correlates positively with local recombination rate (Roselius et al., 2005 and references within, Tenaillon et al., 2004; Wang et al., 2016). This observation has been interpreted as a direct or indirect consequence of recombination on polymorphism.
The direct effect of recombination can first result from the mutagenic nature of recombination itself. Irrespective of the final outcome (CO or NCO), all DSB repair mechanisms rely on the synthesis of short patch of DNA (Figure 6). Unlike DNA replication during the S phase of the cell cycle, DNA synthesis associated with DSB repair by homologous recombination is highly inaccurate (Malkova and Haber, 2012). In yeast, DSB repair during mitotic homologous recombination is accompanied by an increase in mutations near the site of the break. Recently, (Rattray et al., 2015) showed that this was also the case during meiotic recombination. These authors found a 6 to 21-fold increase in mutation rate after meiosis compared to the basal mutation rate observed after mitotic growth. This increase was dependent on SPO11, i.e., on the formation of DSBs, and was more pronounced when meiotic mutation rate was estimated close to a meiotic hotspot.
Another source of diversity directly linked to recombination is the generation of single-nucleotide mutations through GC-biased gene conversions (gBGC). gBGC can occur during the invasion step of meiotic recombination when parental alleles differ. It is hypothesized that, when the mismatch created in the heteroduplex is repaired, the changing of one of the nucleotides would slightly favour a conversion of an AT allele by a GC allele (Webster and Hurst, 2012). There is evidence for gBGC in yeast, mammals and birds and other species (Glemin 2016) but this is more equivocal in angiosperms. gBGC has been reported in rice (Muyle et al., 2011 but see Flowers et al., 2012) and in maize (Rodgers-Melnick et al., 2015) but not in. A. thaliana, where recombination positively correlates with AT-rich regions (Wijnker et al., 2013).

Origin of Brassica napus

Rapeseed (Brassica napus, AACC; 2n=38) is a member of the large Brassicaceae family (~325 genera and 3,740 species [reviewed in (Hohmann et al., 2015)] that include various crops and the model species Arabidopsis thaliana. Rapeseed is a recent allopolyploid species that formed from hybridization events between the ancestors of modern B. oleracea (CC; 2n=18) and B. rapa (AA; 2n=20); these two diploid species diverged from a common ancestor less than 4 million years ago and their genomes were brought back together only recently to form B. napus (around 7500 – 12500 years ago (Chalhoub et al., 2014). As no truly wild B. napus population has been reported, hybridisation between B. napus progenitors is thought to have occurred in cultivated contexts, as a result of either accidental or deliberate inter-specific crosses between crops that were cultivated alongside. The original hybridisation events that gave rise to B. napus occurred more than once, and involved different maternal genotypes that are probably related to B. rapa or an A genome relative (Allender and King, 2010). Genetic diversity analyses revealed a strong population structure, mainly explained by growth habits (spring or winter) and geographical origin (Asian or European for winter types) (Gazave et al., 2016).
The recent release of reference genomes for Brassica napus (Chalhoub et al., 2014), B. rapa (Wang et al., 2011) and B. oleracea (Liu et al., 2014; Parkin et al., 2014) provided further insights into the dynamics of Brassica genome evolution and divergence. The B. napus genome is around 1,2Gb in length (Arumuganathan and Earle, 1991) and contains a minimum of 101,040 gene models (Chalhoub et al., 2014). The assembled Cn subgenome (525.8 Mb) is larger than the An subgenome (314.2 Mb) (Chalhoub et al., 2014). This is consistent with the relative sizes of the assembled Co genome of B. oleracea (~630 Mb) (Liu et al., 2014; Parkin et al., 2014) compared to the Ar genome of B. rapa (312 Mb) (Wang et al., 2011).

Relevance for breeding

Each Brassica crop (B. rapa, B. oleracea and B. napus) shows a rich diversity of morphotypes including leafy heads (Chinese cabbage [AA], cabbage [CC]), enlarged roots (turnip [AA], rutabaga [AACC]), other enlarged organs like stems and inflorescences (cauliflower, Brussel sprouts [CC]), oilseeds (both AA and AACC (Figure 14). Although any of these species can be used as either a vegetable, fodder, oilseed or even as ornamental crop, Brassica rapa and Brassica oleracea are often referred to as leaf vegetables and Brassica napus as an oilseed crop.
Most of the breeding efforts for rapeseed have been dedicated to increase seed yield and to reduce the content of nutritionally undesirable components of the oil and of the seed hull. These efforts led to the development of the double low (“00”) varieties that display concurrently low erucic acid content, which is undesirable in edible oils, and low Glucosinolates (GSLs) content, which in animal feed can result in goitrogen-induced hypertrophy. Among the other objectives currently followed by breeders, a lot of effort has been also invested in the development of “yellow seeded” varieties resulting from reduced condensed tannins content and associated with higher oil and protein content and lower fibber content. Development of varieties with oil properties meeting the requirement of the food processing industry (high oleic and low linolenic acid content) or more recently the development of oils suitable for conversion to biodiesel and industrial lubricants is also a recurrent plant breeding objective, along with the identification of genotypes able to grow under low input farming regimes (especially low nitrogen input).
The narrow origin of Brassica napus associated with intense selection has resulted in a notable decline in genetic diversity in modern cultivars. Most of the current crop germplasms are related (Hasan et al., 2006; Qian et al., 2014) and a strong deficit in polymorphism is observed in regions where QTLs for GLS and erucid acid were mapped (Qian et al., 2014). Although breeders attempted to reintroduce diversity through introgression from B. rapa and B. oleracea, as well as other related Brassica species, they focussed their efforts on a few phenotypic traits of interest. Loss in genetic diversity is more pronounced for the C genome (Qian et al., 2014) probably because of less interspecific hybridization.

AAC triploid hybrids

Recent studies performed with allotriploid (AAC) Brassica hybrids gave further insights into the link between ploidy and CO frequencies (introduced previously, see p. 29). (Leflon et al., 2010) analysed meiosis of Brassica hybrids with the same genomic background but with different karyotypes. These authors found that CO frequencies on chromosome A07 increased in the progeny of allotriploid (ArArC) and allotetraploid (ArArCC) compared to the diploid (ArAr) hybrids; the highest CO rate (by far) being observed in the ArArC hybrid (Figure 19). Furthermore, the magnitude of the increase in AAC hybrids appears to be genotype dependent; triploids produced using Darmor-bzh (ArAdCd) made more COs than triploids produced using Yudal (ArAyCd) (Nicolas et al., 2009). More recently, it was shown that the number and the nature of the chromosomes that are left as a univalent modulate CO frequencies in Brassica triploids; interestingly addition of single chromosome C09, on which PrBn is located, is sufficient to boost CO frequencies (Suay et al., 2014).
Interestingly, and contrary to the anti-CO proteins (see p.15), at least some of the extra-COs observed in the triploids arise from the CO I pathway (dependent on ZMM proteins); (Leflon et al., 2010) observed an increase in the number of chiasmata marked by MLH1 during male meiosis (1,7 fold increase in the triploids compared to the diploid). The single increase of MLH1 foci is however insufficient to account for the almost 3-fold increase in genetic distances observed for female meiosis when comparing interval length between triploids and diploids (Leflon et al. 2010; Pelé et al., subm). Although (Suay et al., 2014) observed a drastic loss of interference in the triploids for almost all the genetic intervals they compared, they concluded that this could not result only from a massive increase in class II CO.
There are still very little insights into the mechanisms that drive the extra-CO formation in the triploids. It is interesting to note that the situation in the triploids echoes what is known about the control mechanisms that depend on the good progression of meiotic recombination (See paragraph 1.3 before, p17). In C.elegans, failure of a single chromosomes pair to synapse result in a compensatory increase of CO on the chromosomes that are correctly synapsed in the same cell (Carlton et al., 2006).

READ GIRLFRIEND ROLEPLAY ASMR

Homoeologous Exchanges generate clusters of differentially expressed genes

Following an HE, gene loss is usually accompanied by replacement with its homoeologue. This results in the establishment of two identical gene copies (e.g. AC  AA) that, unlike many copy number variants, segregate independently. In the classical sense, two independently segregating loci constitute two genes; however, for HEs it seems more relevant to consider a gene the duplicated loci that contribute to the expression of a unique mRNA (Fig. 2A). This is biologically relevant as the same mRNA produced from two independent loci will have the same phenotypic consequences, and also methodologically relevant, as while we can distinguish homoeologues (i.e. A vs C), it is impossible to distinguish sequencing reads that originate from two identical, but independently segregating loci (i.e. A vs A or C vs C).
Based on this premise, we first determined whether the HEs we had confirmed, generate divergent gene expression profiles. To do this we compared the expression profiles of Darmor-bzh and Yudal in regions outside HEs (representing the baseline divergence between the two varieties) and within HEs. Our results not only confirmed the expectation that regions lost in Yudal were enriched in down regulated genes but also demonstrated that the corresponding duplicated regions in Yudal were enriched in upregulated genes compared to Darmor-bzh (Table S4). This holds true for three of the HE-driven duplicated regions in Darmor-bzh, which were enriched in up-regulated genes, however it was not possible to evaluate the equivalent regions lost in Darmor-bzh, as they are not present in the reference genome assembly (Chalhoub et al., 2014).
We then tested whether HE expression profiles are sufficiently different from genome average to be identified without any prior indication of their position. Given that series of adjacent genes are lost or duplicated as a consequence of HEs, we looked for clusters of genes with a consistent direction of transcriptional change. In accordance with previous results, segmentation of gene expression (Fig. 2B) identified all confirmed HEs in Yudal; all lost regions were detected as under-expressed segments, and 6 out of the 13 concurrently duplicated regions were detected as over-expressed segments compared to Darmor-bzh (Tables S5-7). This approach also detected two clusters of genes displaying similar patterns, but that did not overlap with known HEs. The validation procedure described above (SNP/PCR; Tables S2 and S8) was applied, confirming that the corresponding regions were lost in Yudal (Fig. S1). It is thus likely that these two clusters of genes correspond to additional HEs. By contrast, none of the 13 Darmor-bzh HEs were identified by the segmentation analyses (Tables S5-7); this is likely due to the partial assembly of these regions of the reference genome (Chalhoub et al., 2014), which reduced our statistical power to detect these events de novo.
Finally, we observed that genes in HEs had a disproportionate effect on the total transcriptome. While the affected genes represent less than 4% of total gene number, they represent a larger percentage of those with highest (absolute) fold-change between cultivars: i.e. 22% of the top 1% (fold-change > 252; 2, p = 9.5E-100) and 19% of the top 5% (fold-change > 8.9; 2, p = 0). Although we are using a single cell type, these results are a good representation of the genome-wide effects of HEs as 63% of all genes are transcribed in our data set (68% and 45% of these being covered by >10 and >100 reads per sample respectively).

Segregating HEs drive massive gene expression changes within a variety

We next investigated whether any equivalent regions existed between biological replicates within a variety. For the three Darmor-bzh biological replicates, we identified a pattern of expression that was evocative of HEs previously identified, in a single chromosomal region at the top of An1-Cn1 (Figure 2C). PCR confirmed the physical loss of one (ACCC) or two (CCCC) copies of the A genome in this region (Fig. S2). No equivalent regions were identified in Yudal. These results indicated that a newly-formed HE was segregating among Darmor-bzh biological replicates. Contrary to the previously observed HEs, fixed either in Darmor-bzh or Yudal, this segregating event encompassed a very large region (4.4 Mb or 1470 genes). We compared the gene content between the two exchanged homoeologous regions using the synteny tool within the Genoscope Brassica napus genome browser and identified a total of 43 gene models that are specific to the A region (Fig. S3); as these genes have no homoeologue, their loss cannot be compensated in the CCCC genotype. More broadly the HE had a very large effect on the total transcriptome, with affected genes representing the majority of those with highest (absolute) fold-change between the AACC and CCCC genotypes; 94% of the top 1% (fold-change > 34; 2, p = 0) and 47% of the top 5% (fold-change > 1.7; 2, p = 0). This segregating event offered a unique opportunity to evaluate the extent to which variation in gene copy number correlates with gene expression change.

A vast majority of genes show additive expression when duplicated in the newly formed HE

Overall we observed that the level of expression of a gene in the newly formed HE was directly proportional to the number of copies of that gene, with the expression ratio being very close to, or equal to, the ratio of gene copy numbers between Darmor-bzh biological replicates (Fig. S4-5): e.g. A-copy gene expression decreased twofold, while C-copy gene expression increased 1.5-fold between AACC and ACCC genotypes. Only 69 genes (out of 1470; 4.5%) deviated significantly from this general trend. These outliers were enriched in homoeologous pairs (16 pairs; 2, p = 5.5E-21), indicating that homoeologues are likely to respond similarly to gene dosage variation. Remarkably, of the 69 outliers, 42 (60.8%) showed decreased expression when copy number increased. Despite these outliers, for the vast majority of genes (95.5%) copy-specific gene expression was in strict concordance with gene copy number immediately following an HE (Fig. 2D). As a consequence, differences in the summed expression of homoeologues (hereafter, Total(A+C) expression), depended on the relative contributions of the two copies prior to the HE (estimated from the AACC genotype). This represented a continuum where duplication of a dominantly expressed homoeologue led to increased Total(A+C) expression and duplication of a lesser expressed homoeologue led to reduced Total(A+C) expression (Fig. 2D). For this reason, almost half (43%) of the homoeologous pairs affected by the newly formed HE had significantly altered Total(A+C) expression (Fig. 2D). Conversely, for 57%, Total(A+C) expression remained unchanged in the CCCC and ACCC genotypes. This latter group corresponded to genes where the A and C copies contributed equally to Total(A+C) expression in the AACC genotype (Fig. 2D, Homoeologue Bias ~ 0.5).

Most genes within older fixed HEs also show additive expression, but additional factors contribute

To gain insights into longer-term effects of HEs on gene expression we analysed the impact on gene expression of the older HEs fixed in Yudal (Chalhoub et al., 2014). Unlike the case above, this analysis was constrained by the lack of a direct pre-HE reference genotype for comparison. Instead we used the expression pattern observed in Darmor-bzh as a proxy for the pre-HE state in Yudal (Fig. S6). This necessary approach potentially introduced additional layers of transcriptional variation and also reduced the number of HEs amenable to analysis (i.e. Yudal HEs that overlap with Darmor-bzh HEs cannot be used). In spite of this, we still observed that expression of genes within the fixed HEs in Yudal was essentially dosage dependant; most genes duplicated by HEs in Yudal showed an almost 2-fold increase in expression compared to that of their single copy homolog in Darmor-bzh (Fig. S7). However, the absolute dose difference did not appear to be the only determinant of gene expression in Yudal HEs (Fig. 2D, Fig. S8).
To confirm additional influences on expression for genes within fixed Yudal HEs, we compared globally, the concordance in Darmor-bzh and Yudal per-copy expression levels for genes within HEs and for genes outside HEs. If gene expression is purely additive, then these two distributions should be similar. This approach also enabled us to isolate the effects on gene expression attributable to HEs from those due to inter-varietal variation. A two-sample Kolmogorov-Smirnov test verified that the two distributions (concordance in per-copy Darmor-bzh and Yudal expression levels, inside and outside HEs) differed significantly (Fig. S9, p = 3.2E-4), confirming divergent transcriptional output for genes within HEs. While this test demonstrated that the distributions differed, it provided little insight into why. Further analyses, however, shed some light on the drivers of this divergent transcriptional outcome.

Table of contents :

Chapter 1: Bibliographic review
1.1 Progression of meiosis, as seen through the prism of chromosome association/segregation
1.2 The molecular mechanisms of meiotic recombination
1.3 The progression of meiosis is intertwined with meiotic recombination
1.4 The patterning of meiotic COs formation
1.5 How to deal with the polyploidy situation?
1.6 Ploidy level and recombination frequencies
1.7 Impact of recombination on genetic diversity
1.8 CO frequencies and selection
1.9 How to tackle some of the breeder’s challenges?
Chapter 2: The plant model
2.1 Origin of Brassica napus
2.2 Relevance for breeding
2.3 Meiosis in Brassica napus
2.4 AAC triploid hybrids
Chapter 3: Objectives of the PhD
Chapter 4: Deciphering the main source of variation for the meiotic transcriptome of B. napus
4.1 Introduction
4.2 Manuscript: Homoeologous exchanges drive extensive dosage dependent changes in gene expression and influence allopolyploid genome evolution
4.3 Overview of the transcriptome of Brassica napus meiocytes
4.3.1 Objectives
4.3.2 Details on the experimental design
4.3.3 Details on the mapping
4.3.4 De novo transcriptome assembly
4.3.5 Description of the meiotic transcriptome of B. napus
4.3.6 Partitioning the source of transcriptome variation
4.3.7 Differential gene expression between homoeologs does not result in global sub-genome dominance
4.3.8 Variation of the transcriptome between Darmor-bzh and Yudal
4.3.9 Ploidy change had limited impact on meiotic gene expression
4.3.10 A closer look into PrBn confidence interval
4.3.11 Conclusions and Perspectives
Chapter 5: FANCM
5.1 Manuscript: FANCM limits meiotic COs in Brassica crops
Chapter 6: General discussion
6.1 The meiotic transcriptome is highly variable within Brassica napus
6.2 Phenotypic consequences of HEs
6.3 The anti-CO activity of FANCM is conserved in the Brassica
References