LncRNA transcription and the associated chromatin signature
The majority of eukaryotic lncRNAs are produced by RNA polymerase II, with some exceptions such as the murine heat-shock induced B2-SINE RNAs (Espinoza et al., 2007), or the human neuroblastoma associated NDM29 (Massone et al., 2012), which are synthesized by RNA polymerase III. However, the last two examples are not strictly considered as lncRNAs because the transcript length is below the arbitrary threshold of 200 nt. In plants, two specialized RNA polymerases, Pol IV and Pol V, transcribe some lncRNA genes (Ariel et al., 2015). Many lncRNAs are capped at the 5’ end, except those processed from longer precursors (intronic lncRNAs or circRNAs). However, some ambiguities exist concerning the presence of a cap, especially for highly unstable and low abundant transcripts, since they cannot all be captured by the CAGE-seq technique. LncRNAs may or not be 3’ end polyadenylated; in addition they may also be present as both forms, such as bimorphic transcripts like NEAT1 and MALAT1 (Yang et al., 2011), (Djebali et al., 2012). LncRNAs with a polyadenylation signal have higher stability than those that are poorly polyadenylated or not polyadenylated, with the exception of lncRNAs bearing specific 3’ end structures as in case of MALAT1 (Wilusz et al., 2012).
LncRNA genes can have a multi-exonic composition with similar splicing signals as PCG, and therefore could undergo splicing into several different isoforms with distinct functional outcomes and clinical relevance (Spurlock et al., 2015), (Hoffmann et al., 2015), (Meseure et al., 2016). However, they usually comprise fewer and slightly longer exons than PCGs (Derrien et al., 2012a), (Bogu et al., 2016).
As RNA polymerase II transcribes most of the lncRNA genes, their genomic regions present a chromatin organization that resembles that of PCGs, with a few differences. This could be due to the globally low expression of lncRNAs, which is a consequence of either low rate of transcription, lower stability or both. Globally, lncRNAs TSS reside within DNase I hypersensitive sites suggesting nucleosomes are depleted from this region. LncRNA promoters have lower levels of histone H3 K4 trimethylation (H3K4me3), which is in accordance with their low transcription rate. lncRNas associated to regulatory elements such as enhancers (eRNAs) and promoters (PROMPTs) present high levels of histone H3 K4 monomethylation (H3K4me1) and K27 acetylation (H3K27ac) at promoters, which is considered as a specific signature of enhancer and promoter associated unstable transcripts (Marques et al., 2013). Over the body of most lncRNAs with the exception of eRNAs and PROMPTs, histone H3 K36 trimethylation (H3K36me3) can be found and is a mark of the elongating phase of transcription. In mouse, bidirectional transcription which is often associated with developmental genes and genes involved in transcription regulation, was found to harbor high H3K79 dimethylation (H3K79me2) and elevated RNA polymerase II levels. This signature is characteristic of intensified rates of early transcriptional elongation within a region transcribed in both directions (Lepoivre et al., 2013).
Expression pattern of lncRNAs: stability, specificity, and abundance
Several genome-wide studies addressed lncRNA stability and, depending on the employed experimental approach, revealed some discrepancy for different species of lncRNAs. In mouse, the measurements of the lncRNA half-life showed they are less stable than mRNAs (Clark et al., 2012). Comparison of the stability of different lncRNA species revealed that intronic or promoter-associated lncRNAs are less stable than either intergenic, antisense, or 3’ UTR-associated lncRNAs. Single exon transcripts, a class of nuclear-localised lncRNAs, are overrepresented among unstable transcripts. Circular RNAs are an example of highly stable lncRNAs compared to their linear counterparts (Enuka et al., 2016).
Multiple transcriptome profiling globally highlighted a highly specific spatio-temporal, lineage, tissue and cell-type expression patterns for lncRNAs compared to PCGs; only a minority are ubiquitously present across all tissues or cell-types, such as TUG1 or MALAT1 (Djebali et al., 2012), (Ward et al., 2015), (Li et al., 2015a). As previously mentioned, brain and testis represent a very rich source of uniquely expressed lncRNAs supporting the hypothesis that such transcripts are important for the acquisition of specific phenotypic traits (Ward et al., 2015), (Washietl et al., 2014). The ubiquitously expressed lncRNAs are often highly abundant, whereas specific lncRNAs present in one tissue or cell-type tend to be expressed at low levels (Jiang et al., 2016). Moreover, inter-individual expression analysis in normal human primary granulocytes revealed increased variability in lncRNA abundance compared to mRNAs (Kornienko et al., 2016). Some disease-associated single-nucleotide polymorphisms (SNPs) within lncRNA genes and their promoters were linked to altered lncRNA expression, thus supporting their functional relevance in pathologies (Kumar et al., 2013). The high specificity of lncRNAs expression argues in favor of important regulatory roles that these molecules can act in different biological contexts, including normal and pathological development.
Subcellular localization of lncRNAs
Globally, unlike mRNAs, many lncRNAs have nuclear residence with focal or dispersed localization pattern (NEAT1) (Cabili et al., 2015). However, others were also found both in the nucleus and in the cytosol (TUG1, HOTAIR), or in the cytosol exclusively (DANCR) (Djebali et al., 2012). Multiple determinants, such as a specific RNA motif (BORG) or RNA-protein assemblies may dictate the subcellular localization of lncRNAs and define their function (Chen, 2016; Shukla et al., 2018; Zhang et al., 2014). Remarkably, environmental changes or infection can induce lncRNA delocalization (or active trafficking) from one cellular compartment to another, as in the case of stress-induced lncRNAs (Giannakakis et al., 2015). HuR and GRSF1 modulate nuclear export and mitochondrial localization of the nuclear-encoded RMRP lncRNA (Noh et al., 2016).
Knowing the subcellular localization of a particular lncRNA provides important insights into its biogenesis and function. LncRNAs could be exclusively cytosolic (DANCR and OIP5-AS1) or nuclear (NEAT1) or have a dual localization (HOTAIR) (Ayupe et al., 2015). Several subgroups of lncRNAs with a precise subcellular localization have been defined, such as chromatin enriched (che)RNAs (Werner and Ruthenburg, 2015a), and chromatin associated lncRNAs, CARs (Mondal et al., 2010). cheRNAs were later confirmed to act as activators of transcription for nearby genes and Werner suggested chromatin-enriched RNAs are the most effective chromatin-signature in a very cell-type specific manner (Werner et al., 2017). Many nuclear and chromatin functions have been proposed for such lncRNAs, including the assembly of subnuclear domains or RNP complexes, the guiding of chromatin modifications, and the activation or repression of protein activity (Singh and Prasanth, 2013). GAA repeat-containing RNAs, GRC-RNAs, represent a subclass of nuclear lncRNAs that show focal localization in the mammalian interphase nucleus, where they are a part of the nuclear matrix. They have been suggested to play a role in the organization of the nucleus by assembling various nuclear matrix-associated proteins (Zheng et al., 2010).
Classification according to genomic location in respect to PCGs
This attribute is commonly used by the GENCODE/Ensembl portal in transcript biotype annotations, but it is also employed on an individual scale by consortia and laboratories for newly assembled lncRNA transcripts. Initially transcripts are classified as either intergenic or intragenic (Figure C1-4). Long or large intergenic non-coding (linc)RNAs do not intersect with any protein-coding and ncRNA gene annotations. This category also includes the adopted GENCODE and homonymous biotype of long or large intervening ncRNAs that were originally defined by specific histone H3 K4-K36 chromatin signatures within evolutionary conserved genomic loci (Khalil et al., 2009), (Guttman et al., 2009a). LincRNAs are usually shorter than PCGs, are transcribed by RNA polymerase II, contain 5’-caps, are 3’-polyadenylated, and are spliced. Although several highly conserved lincRNAs exists, the majority possess modest sequence conservation comprising short, 5’ biased patches of conserved sequence nested in exons (Hezroni et al., 2015). Highly conserved lincRNAs are believed to contribute to biological processes that are common to many lineages, such as embryonic development (Necsulea et al., 2014), while others are proposed to assure phenotypic and functional variations at individual and interspecies levels. Many, if not most, lincRNA are localized in the nucleus where they exercise their regulatory functions. One such example is lincRNA-p21 which is induced by p53 upon DNA damage (Huarte et al., 2010). LincRNA-p21 physically associates with and recruits the nuclear factor hnRNP-K to specific promoters mediating p53-dependent transcriptional responses.
Table of contents :
TABLE OF CONTENTS
Chapter 1. Generalities on long non-coding RNAs
1. History and discovery of lncRNAs
1.1. A role for RNA in the cell: the central dogma of molecular biology
1.2. The first regulatory non-coding RNAs
1.3. From non-coding genome to non-coding transcriptome
2. A general portrait of lncRNA genes and transcripts
2.1. Origin and evolutionary conservation
2.2. Role of lncRNAs in biological diversity
2.3. Coding potential of lncRNA transcripts
2.4. LncRNA transcription and the associated chromatin signature
2.5. Expression pattern of lncRNAs: stability, specificity, and abundance
2.6. Subcellular localization of lncRNAs
3. Classification of lncRNAs
3.1. Classification according to length
3.2. Classification according to genomic location in respect to PCGs
3.3. Classification according to genomic location within specific DNA regulatory elements
3.4. Classification according to lncRNA mechanism of action
3.5. Classification according to associated biological processes
Chapter 2. Long non-coding RNAs as regulators of the epithelial-to-mesenchymal transition.
1. EMT as a driver of metastasis, drug resistance and tumor recurrence
1.1. Generalities on cancer
1.2. EMT as a driver of metastasis and tumor progression
1.3. Molecular basis of the EMT
2. LncRNAs associated with the EMT
2.1. Activators of EMT
2.2. Repressors of EMT
2.3. lncRNAs with controversial roles in EMT
MATERIALS AND METHODS
Chapter 3. Methods
1. In vitro cell model to study EMT
2. CRISPR-based transcriptional activation screening
2.1. Generalities on CRISPR-based screens
2.2. CRISPRa library cloning and phenotypic screening
3. General methods
Chapter 4. A role for HOTAIR in the EMT
2. Publication n°1
“HOTAIR promotes an epithelial-to-mesenchymal transition through relocation of the histone demethylase Lsd1.”
Chapter 5. Functional discovery of novel lncRNAs in the EMT
1. Introduction 95
2. Publication n°2
“CRISPRa screen of chromatin-enriched lncRNAs reveals a new regulator of epithelial identity.”
3. Additional data
3.1. CRISPRa-screening of invasion-associated lncRNAs
3.2. MAL-1 knock-down using siRNAs
3.3. Transcriptomic analysis of MAL-1 overexpression
Chapter 6. Discussion
1. Cyto- and Chro-seq as a tool to study the non-coding transcriptome
2. The role of lncRNAs in the epithelial-to-mesenchymal transition
2.1. HOTAIR as a modulator of Lsd1 function
2.2. MAL-1, a novel lncRNA repressor of epithelial identity
2.3. lncRNAs as regulators of epithelial plasticity
RÉSUMÉ EN FRANÇAIS