Cooperation of non-coding and coding transcripts for protein synthesis
The protein composition within a cell directly influences its fate. For coding genes, the final step of gene expression is the mRNA translation, in which the genetic information contained within the transcript is decoded into a protein (Cramer, 2019).
First, the gene is transcribed into an RNA molecule which is getting mature through the addition of a poly(A) tail and a N7-methylated guanosine linked to the first nucleotide in 5’. These modifications increased the RNA stability and translation efficiency (Shatkin, 1976). The linear RNA sequence can then be modified, or spliced, by the spliceosome, leading to the eviction of some portion of the transcript (Filichkin et al., 2015). In humans, 73 mutations impairing the splicing process has been linked to cancer predisposition (Tate et al., 2019), strengthening the importance of this mechanism for cell life. Most of the time, during splicing all intronic regions are removed from the transcript. However, certain introns can be retained and certain exons can be skipped, leading to different mature RNA from one locus through Alternative Splicing (AS). Interestingly, in plants the majority of AS relates to intron retention, whereas in humans it consists in exon skipping (Sammeth et al., 2008; Wang et al., 2008). The spliceosome is constituted of small nuclear RNAs (snRNAs) that bind to proteins which together form a ribonucleoprotein complex and catalyze the splicing (Wahl et al., 2009). Notably, more than 90 percent of pre-mRNA transcripts are spliced in humans, many in a tissue-dependent or developmental-related manner (Wang et al., 2008). Similarly, in higher plants, more than 60 % of transcripts containing intron undergo AS (Chaudhary et al., 2019). Once mature, the coding transcripts exit the nucleus and are bound and decoded by the ribosomal complex, a large complex form with multiple proteins and ribosomal non-coding RNAs (rRNAs). The ribosomal complex is made of two protein subunits heterogeneous in size, together with four rRNAs implicated in the assembly and functionality of the ribosomes (Merchante et al., 2017). This complex is able to recognize specific sequences from the coding transcripts and produce the corresponding protein with the help of another type of non-coding transcripts: the transfer RNA (tRNA), which harbors the amino-acids (Sanchita et al., 2020). Strikingly, it has been estimated that one amino acid is transferred every 60ms to the forming polypeptides chain (Zaher and Green, 2009). This process is called mRNA translation and it is realized through the cooperation of proteins and non-coding transcripts. In addition, small nucleolar RNAs are another class of structural RNAs that conduct chemical modifications, such as methylation and pseudouridylation, of snRNAs, rRNAs and tRNAs, participating in their maturation (Streit and Schleiff, 2021). Together, mRNA splicing and translation are fascinating examples showing the importance of non-coding transcriptional units cooperating with proteins for the one basal function of the cell: the protein synthesis.
In addition to these house-keeping non-coding RNAs, new regulatory RNAs have been identified in recent years. Small regulatory non-coding RNAs or small RNAs (sRNAs) are defined as transcripts with a length in-between 19-25nt and not translated into protein. They are issued from double strands (ds) of RNA. Depending on their biogenesis and their mechanism of action they can be classified in different major types: small interfering RNAs (siRNAs), microRNAs (miRNAs) and phasiRNAs (Brant and Budak, 2018), all of which, fine-tune the transcriptional or post-transcriptional gene activity (Voinnet, 2009; Budak and Akpinar, 2015).
22.214.171.124 The siRNAs
SiRNAs are 21-24nt length RNAs which inhibit transcription through DNA methylation via RNA dependent DNA methylation (RdDM). This mechanism is notably critical for the silencing of Transposable Element (TE) activity in both animals and plants (Zemach and Zilberman, 2010). Also, RdDM-deficient Arabidopsis mutants present higher TE activity, even though transposition is still pretty rare (Ito et al., 2011). In the TE-rich maize genome, methylated CHH islands are found in-between activated genes and silent-TE, preventing the spreading of active euchromatin to inactive heterochromatin (Li et al., 2015a).
RdDM action is a well described process involving siRNAs that arise from genes transcribed by either RNA Polymerase II (Pol II) or Pol IV (Figure 1). Interestingly, Pol IV is a plant-specific Polymerase that shares similar subunits with Pol II (Haag and Pikaard, 2011) and specifically implicated in siRNA biogenesis (Matzke and Mosher, 2014; Matzke et al., 2015).
Upon Pol II transcription, the single strand (ss) siRNA-gene RNA is converted to a dsRNA through transcription of its corresponding strand by the RNA DEPENDENT RNA POLYMERASE6 (RDR6). Dicer-Like2 (DCL2) or DCL4 cut the dsRNA into 21 and 22nt, respectively. One of the strands is finally included in the RNA-Induced Silencing Complex (RISC) through its interaction with Argonaute6 (AGO6) (Wu et al., 2012; Nuthikattu et al., 2013; McCue et al., 2015). Interestingly, Pol II transcripts can also naturally generate hairpin structures through base complementarity leading to a dsRNA-like structure without the need of RDR6. In this case, the dsRNA is cleaved by DCL3 every 24nt and one of the strands is included in the RISC through its interaction with AGO4 or AGO6 (Figure 1; Pol II-dependent siRNA production). On the other hand, Pol IV dependent transcripts are converted into dsRNA with RDR2 (Smith et al., 2007; Law et al., 2013). Then, the dsRNA is processed into 21-24nt through cleavage with DCL2/3 or DCL4. Interestingly, an important proportion of RdDM targets are still methylated within dcl1/dcl2/dcl3/dcl4 quadruple mutants, indicating a DCL-independent dsRNA cleavage (Yang et al., 2016). Anyhow, one of the strands is loaded into the RISC through interaction with AGO4 or AGO6 (Figure 1; Pol IV-dependent siRNA production).
Once the RISC is formed, it associates with the chromatin through base complementarity in-between the ssRNA associated with AGO and the DNA and mediates DNA methylation with the help of other proteins and co-factors. More precisely, for both Pol II or Pol IV dependent siRNA synthesis, the single strand RNA directs the RISC through base complementarity with a nascent Pol V transcript on the region where the DNA methylation occurs. Finally, RISC is able to recruit the DNA methyltransferase DOMAINS REARRANGED METHYLASE2 (DRM2) which methylate the DNA with the help of RNA-DIRECTED DNA METHYLATION1 (RDM1) which bind to both AGO4/AGO6 and DRM2. Interestingly, to maintain Pol V transcript next to the chromatin and facilitate DNA methylation, the yeast and mammalian homolog Ribosomal RNA processing 6 (RRP6), retain the RNA close to the active Pol V. In addition, the Pol V RNA-AGO RNA interaction is strengthened by the INVOLVED IN DE NOVO 2 (IDN2)–IDN2 PARALOGUE (IDP) complex that bind SWIB3 (part of the SWI/SNF complex) participating in chromatin decondensation, facilitating Pol V activity (Zhang et al., 2018a) (Figure 1). Methylated DNA positively correlates with a condensed chromatin and a low transcriptional activity. The effect of DNA methylation on gene expression is detailed further in section 2.2.1.
Figure 1: RNA Directed DNA methylation pathways (Adapted from Zhang et al., 2018). For Pol-IV dependent siRNA production, the Pol IV recruitment to the chromatin is facilitated by SAWADEE HOMEODOMAIN HOMOLOGUE 1 (SHH1), which binds dimethylated histone H3 lysine 9 (H3K9me2) (Law et al., 2013; Zhang et al., 2013). Also, Pol IV processing is facilitated by the chromatin remodeler SNF2 DOMAIN-CONTAINING PROTEIN CLASSY 1 (CLSY1) (Zhang et al., 2013; Smith et al., 2007). Pol IV dependent non-coding RNAs (P4 RNAs) are transformed into a dsRNA via the RNA-DEPENDENT RNA POLYMERASE 2 (RDR2). The dsRNA is either cleaved by DICER-LIKE PROTEIN 2 (DCL2), DCL3 and DCL4 (pathway 1), or cleaved by non-DCL proteins (pathway 2), both leading to mainly 24nt siRNAs. These siRNAs interact with ARGONAUTE 4 (AGO4) or AGO6 and pair with a Pol V dependent scaffold transcript which together recruit DOMAINS REARRANGED METHYLASE 2 (DRM2), which methylates the DNA. Interestingly, Pol II can also generate RdDM-related siRNAs. On one hand, Pol II transcripts can harbor a stem loop structure which is cleaved by DCL3 generating 24nt siRNAs (pathway 3). On the other hand, Pol II transcripts can serve as template for RDR6 mediated dsRNA synthesis (Wu et al., 2012; Nuthikattu et al., 2013; McCue et al., 2015), which is then cleaved by DCL2 and DCL4 generating 21-22nt siRNAs (pathway 4). Finally, the Pol II dependent siRNAs transcripts interact with AGO4 or AGO6. The association between the AGOs complex and Pol V is facilitated by RNA-DIRECTED DNA METHYLATION 3 (RDM3) (He et al., 2009; Bies-Etheve et al., 2009). The production of Pol V dependent scaffold RNA needs the DDR complex, comprising the chromatin remodeler DEFECTIVE IN RNA-DIRECTED DNA METHYLATION 1, DEFECTIVE IN MERISTEM SILENCING 3 and RDM1. The DDR complex interacts with AGO4, DRM2, SUPPRESSOR OF VARIEGATION 3-9 HOMOLOGUE PROTEIN 2 (SUVH2) and SUVH9, and bind single-stranded methylated DNA and recruit Pol V (Gao et al., 2010; Kanno et al., 2004; Kanno et al., 2008; Law et al., 2010; Zhong et al., 2012; Liu et al., 2014; Johnson et al., 2014). Finally, the retention of Pol V transcripts is mediated by the RNA-binding proteins RRP6-LIKE 1 (RRP6L1 (Zhang et al., 2014)) and the INVOLVED IN DE NOVO 2 (IDN2)–IDN2 PARALOGUE (IDP) complex, which interacts with the SWITCH/SUCROSE NONFERMENTING (SWI/SNF), a chromatin-remodeling complex (Ausin et al., 2009; Zheng et al., 2010; Ausin et al., 2012; Zhang et al., 2012; Finke et al., 2012; Xie et al., 2012; Zhu et al., 2013). 14
MicroRNAs (miRNAs) are 19-24nt length RNAs which inhibit gene expression at the post-transcriptional level. Using their base complementarity, they can direct a RISC to other RNAs triggering their cleavage or avoiding their translation into protein (Figure 2). Strikingly, it has been estimated that one third of the human genome is regulated by miRNA (Hammond, 2015). Rationally, in plants miRNA are implicated in almost all biological processes such as development and responses to biotic and abiotic stimuli (Budak and Akpinar, 2015). MiRNA genes generate 100-200nt length pri-miRNA transcripts that naturally form a stem-loop structure through base complementarity. In plants, a complex composed notably of DCL1, the dsRNA binding protein HYPONASTIC LEAVES1 (HYL1), nuclear cap binding complex (CBC) and a C2H2-type zinc finger (SERRATE) processed the pri-miRNA by removing 15nt from each extremity of the RNA stem-loop to form the precursor miRNA (pre-miRNA) (Axtell and Meyers, 2018). Then, the pre-miRNA is cleaved into a 20-24nt fragment to form a miRNA/miRNA* duplex. This duplex is methylated at the 3’ end by HEN1 which protects them from degradation. Finally, one of the strands is taken by AGO1 to form the RISC, which together mediates mRNA cleavage or translational repression through base complementarity binding of the miRNA complex with the RNA target (Figure 2).
Initially, it was thought that plant miRNAs mainly act by transcript cleavage thanks to the high sequence complementarity to their targets. Nevertheless, plant miRNAs are also found enriched on membrane-bound polysomes (Li et al., 2016a) increasing evidence that they also act through translation repression as mammal’s miRNA (Millar and Waterhouse, 2005). For example, SQUAMOSA PROMOTER BINDING PROTEIN-LIKE 3 (SPL3), REVOLUTA, SCARECROW-LIKE PROTEIN 4 (SCL4), APETALA 2 (AP2), and COPPER/ZINC SUPEROXIDE DISMUTASE 1 (CSD1) and CSD2 are all mRNAs targeted by miRNA and subjected to both transcript cleavage and repression of the translation (Yu et al., 2017). Also, as miRNA are relatively small, they may target different mRNA that share similar sequences, likely belonging to the same gene family. For example, in Arabidopsis thaliana miR156 regulates the transition from vegetative to flowering phase by changing the expression of several SPL genes (Wang, 2014), and the miR167 targets various ARF genes implicated in auxin signaling (Xia et al., 2015). Also, miRNA evolution may recruit new targets in one species. For example, the miR396-GRFs regulation is conserved 15 in angiosperms and gymnosperms (Chorostecki et al., 2012; Debernardi et al., 2012) but Medicago truncatula miR396 represses MtGRF together with two MtbHLH79 affecting root growth and mycorrhizal colonization (Bazin et al., 2013).
Mutant plants impaired in the miRNA biogenesis pathway present strong developmental defects suggesting miRNA are implicated in a wide range of biological processes (Budak and Akpinar, 2015). For example, Arabidopsis dcl1 knock-down mutants exhibit pleiotropic developmental defects, and the dcl1 knock-out is embryo-lethal (Chen, 2005). Similarly, disruption of the HYL1, HEN1, DCL1, HST and AGO1 genes impaired fertility (Cubillos et al., 2012; Oliver et al., 2017). Also in rice, proper function of the MEIOSIS ARRESTED AT LEPTOTENE 1 (MEL1), component of the RISC, is necessary for proper pollen grain development (Nonomura et al., 2007; Komiya et al., 2014). Similarly, the maize AGO104, also functioning through RdDM , is directly involved in meiosis (Singh et al., 2011). MiRNAs activity also participates in the plant response to environmental cues. For example, in Gossypium hirsutum, miRNVL5 expression decreases upon salt stress, positively participating in the plant salt stress response through the increase of expression of GhCHR, a positive regulator of salt stress tolerance (Gao et al., 2016). Similarly, the Arabidopsis miR156 is implicated in heat stress resilience and memory through the regulation of SPL genes (Stief et al., 2014), whereas the miR398 participates in heat tolerance via csd-mRNA cleavage (Guan et al., 2013). Finally, miRNA assists the assimilation of key nutrients. For example, the Arabidopsis miR395 targets the ATP sulfurylase (APS) gene transcript involved in sulfate assimilation, participating in the plant sulfate metabolization (Jones-Rhoades and Bartel, 2004). Also, the miR399 generated from 6 loci participates in the plant Pi homeostasis through the targeting of PHO2 mRNA (Pant et al., 2008). PHO2 encodes for an ubiquitin-conjugating E2 enzyme that mediates the degradation of the PHO1 protein, implicated in Pi loading to the Xylem (Liu et al., 2012a). Logically, overexpression of the miR399 increased PHO1 protein level leading to Pi hyperaccumulation (Chiou et al., 2006). Given the direct implication of miRNA for plant development and response to environmental stresses, their manipulation has the potential to enhance some agronomic traits.
Figure 2: The miRNA pathway (Adapted from Li et al., 2018). Pol II-dependent miRNA precursor transcripts generate a natural hairpin structure through base pair complementarity forming the pri-miRNA. Polyadenylated tail and 5’ cap are removed through DCL1-mediated cleavage generating the pre-miRNA. The pre-miRNA is further cleaved by DCL1 resulting in mature miRNA/miRNA* duplex in which the duplex extremity is methylated by HEN1. HASTY translocate the miRNA/miRNA* duplex from the nucleus to the cytoplasm. Finally, one strand of the miRNA/miRNA* duplex is recruited by AGO1 forming the mature RNA-induced silencing complex (RISC), which interacts with the target mRNA through base complementarity. Binding of the miRNA-RISC complex to the mRNA can generate cleavage of the mRNA or inhibition of the translation.
The phasiRNA and tasiRNAs
Phased secondary siRNAs (phasiRNAs) are generated after miRNA-mediated RNA cleavage of specific transcripts and leads to the production of regulatory sRNAs acting likely as miRNAs (Allen et al., 2005). Indeed, the miRNA-mediated RNA cleavage does not always lead to direct degradation of the cleaved transcript. In certain cases, part of the cleaved RNAs, named as phased transcripts, is processed into phasiRNAs through the RDR/DCL machinery (Chen et al., 2010). More precisely, SUPPRESSOR OF GENE SILENCING (SGS3) binds to the phased transcripts to avoid their degradation and facilitates their dsRNA transformation through RDR6. Once in double strand form, DCL4 processed them regularly into 21nt long phasiRNA duplex which are methylated by HEN1 and processed into the AGO1-RISC as for miRNA (Brodersen and Voinnet, 2006). Consequently, phasiRNA are likely to function like miRNA to repress gene expression via transcript cleavage or repression of the translation (Chen, 2009; Yu et al., 2017) (Figure 3).
PhasiRNA were initially status as tasiRNA for Trans Acting Secondary siRNA since they were initiated from the non-coding TAS gene transcripts (Vazquez et al., 2004; Felippes and Weigel, 2009). As for miRNA they are involved in various aspects of plant development and its response to the environment. For example, the miR390-TAS3-ARF pathway is implicated in lateral root growth, leaf formation, embryogenesis and is conserved in plants. Notably, in ago7 or dcl4 tomato mutants, unable to generate phasiRNAs, the increased level of ARFs transcript leads to abnormal leaf shape (Yifhar et al., 2012). Similarly, Medicago truncatula and maize ago7 mutants present cylindrical leaves with irregular polarity (Douglas et al., 2010; Zhou et al., 2013). Also, overexpression of miR3954 triggers phasiRNAs production from NAC genes in citrus, resulting in early flowering (Liu et al., 2017a). Interestingly, a wheat lncRNA WSGAR is targeted by miR9678 and is involved in seed development and germination (Guo et al., 2018). PhasiRNAs are also involved in abiotic stresses. For example, in Arabidopsis, miR173 targets the HEAT-INDUCED TAS1 TARGET1 (HTT1) and HTT2, where increasing the miR173 activity and subsequent phasiRNAs production increase the heat plant sensitivity (Li et al., 2014a). Also, drought-stress resilience of Populus plants involved miR482, miR828, and miR6445 activity. More precisely, miR6445 targets NAC genes resulting in phasiRNAs productions targeting other NAC genes (Xie et al., 2017). Closely, in legume, drought stress triggers the production of NAC700-phasiRNA through the miR1514a activity, affecting plant drought tolerance (Sosa-Valencia et al., 2017). Finally, the sweet potato miR828 accumulates in response to wound and target IbMYB and IbTLD implicated in lignin and H2O2 content, affecting plant damage recovery (Lin et al., 2012). Wound is a mechanical stress induced under abiotic and biotic stresses. Interestingly, the plant biotic interaction, either beneficial or detrimental, is also governed by phasiRNAs. Among them, the immune receptor NUCLEOTIDE-BINDING LEUCINE-RICH REPEAT (NLR) and PENTATRICOPEPTIDE REPEAT (PPR) genes produced phasiRNA after miRNA-mediated cutting and are implicated in plant-microbe interactions, symbiosis and defense (Fei et al., 2013). Other resistance genes are targeted by phasiRNAs, supporting that phasiRNAs are important factors of plant immunity. For example, miR9863 from barley and wheat targets an MLA gene encoding an NLR protein (Liu et al., 2014a). Similarly, NLR genes from the norway spruce are targeted by the miR482/2118 family and generate phasiRNAs (Xia et al., 2015).
Altogether, miRNAs, siRNAs and phasiRNAs, demonstrate the complexity of sRNA-mediated gene regulation, and the importance that they have in all aspects of plant life from organogenesis, to perception of the environment. However non-coding transcripts are not only small or precursors of small RNAs.
Table of contents :
1. Non coding RNAs: from junk to utility
1.1 A large diversity of transcripts
1.1.1 Cooperation of non-coding and coding transcripts for protein synthesis
1.1.2 The small non-coding regulatory transcripts
126.96.36.199 The siRNAs
188.8.131.52 The miRNAs
184.108.40.206 The phasiRNA and tasiRNAs
1.2 Discovery of the Long Non-Coding RNAs world
1.2.1 LncRNAs features
1.2.2 Are lncRNAs really non-coding?
1.2.3 Conservation of lncRNAs
2. Regulation of gene transcription
2.1 First dimension: Cis-regulatory motifs
2.2 Second dimension: Epigenetic marks
2.2.1 DNA methylation
2.2.2 Chemical modifications of histones
2.3 Third dimension: chromatin conformation
2.3.1 General configuration of genome within the nucleus
2.3.2 Long range interactions
2.3.3 Short-range interaction
220.127.116.11 Looping within a gene locus
18.104.22.168 Enhancer loop
3 Regulation of gene expression by lncRNAs
3.1 Modulation of the transcriptional activity by lncRNAs
3.2 LncRNAs-mediated modification of the epigenetic landscape
3.3 Chromatin architecture changes through lncRNAs activity
3.4 LncRNAs mediating post-transcriptional regulation of gene expression
3.5 Review: Regulatory long non-coding RNAs in root growth and development
4. Aim of the thesis
5. The non-coding transcriptome from two Arabidopsis ecotypes
5.1 Publication: Landscape of the Noncoding Transcriptome Response of Two Arabidopsis Ecotypes to Phosphate Starvation
6. Exploring transcriptomes to find new cis-regulatory root-related lncRNAs
6.1 LATERALINC, new regulator of lateral root growth in Arabidopsis
6.1.2 Results and discussion
22.214.171.124 LATERALINC is positively correlated with IAA14 during plant development
126.96.36.199 Lateral root growth is impaired in LATERALINC downregulated lines
188.8.131.52 LATERALINC expression is correlated with the one of IAA1 but its downregulation does not affect IAA14 and IAA1 genes expression
6.1.4 Conclusion and perspectives
6.2 MARS, a lncRNA implicated in the transcriptional regulation of an embedded gene cluster
6.2.2 Identification of the MARS lncRNA
6.2.3 Preprint: The lncRNA MARS modulates the epigenetic reprogramming of the marneral cluster in response to ABA
6.2.4 Additional results and discussion
184.108.40.206 Deregulation of genes involved in the Carbon/Nitrogen equilibrium and cell oxidation status in the RNAi MARS line
220.127.116.11 MARS physically interact with the marneral cluster genomic region to titrate LHP1 binding
6.2.6 Conclusion and perspectives of complementary results
III Conclusions and perspectives
7. The non-coding transcriptome, signature of the plant local environment
7.1 Conservation of non-coding genes
7.2 The lncRNA features: an advantage for a quick adaptation to environmental changes?
7.3 Two ecotype-associated lncRNAs in the environmental control of plant growth and development
7.4 An RdDM-acting lincRNA regulates the root system architecture
8. The MARS lncRNA, a novel actor in the plant response to environment
8.1 MARS-mediated marneral genes expression changes is involved in the plant response to its environment
8.2 ABA as precursor or signaling molecule for the marneral biosynthesis and metabolization
8.3 MARS: an enhancer lncRNA located within a Super Enhancer region?
8.4 MARS act in cis or in trans for the control of marneral cluster gene expression?
Synthèse en français