Bivalent domains and Polycomb regulation in pluripotent cells
In mouse ES cells, H3K27me3 and PRCs can be found at large genomic domains covering repressed developmental regulators (Boyer et al. 2006; Mikkelsen et al. 2007; Endoh et al. 2008; Ku et al. 2008). Inversely, the trithorax system is associated with trimethylation of histone H3 lysine 4 (H3K4me3) and is associated with active transcription site (Kingston and Tamkun 2014). However, a subset of mouse ES cells CpG islands (CGIs) contain nucleosomes marked by both H3K27me3 and H3K4me3 histone marks (Fig. 1.3). The genomic sites containing this antagonistic combination of histone modifications have been referred as “bivalent” (Azuara et al. 2006; Bernstein et al. 2006). Promoters showing the simultaneous presence of these two chromatin marks are also characterized by the occupancy of the aforementioned Serine-5 phosphorylated form of RNA-Pol II associated with transcription initiation and has been shown to be mediated by ERK phosphorylation (Brookes et al. 2012; Tee et al. 2014). The presence of this poised state of RNA-Pol II has been proposed to contribute to the robust activation of developmental genes during the exit from pluripotency and the initiation of differentiation (Voigt et al. 2012). Regarding their equivocal chromatin features, upon differentiation, bivalent domains typically resolve in monovalent H3K4me3 or H3K27me3 at genes that are activated or silenced, respectively, according to their novel cell type (Mikkelsen et al. 2007; Voigt et al. 2012; Ferrari et al. 2014) (Fig. 1.3).
Fig. 1.3. Schematic representation of bivalent domains showing H3K27me3, H3K4me3 and poised RNA Pol II enrichment and their resolution towards differentiation (Di Croce and Helin 2013).
Despite an apparent simple model proposing bivalency as a way to poise key developmental factors for rapid activation, the functional evidences supporting that view are remains relatively limited. The study of KO ES cells for key PRC2 complex components, as Suz12, Eed or Jarid2 indicate that PRC2 complex has a limited impact on self-renewal. Although many bivalent genes tend to get upregulated upon single or combinatorial depletion of PRC2 members, mouse ES cells are not subjected to massive differentiation. However, their differentiation is greatly impaired, with inappropriate silencing of pluripotency factors and abnormal expression of developmental markers in agreement with the severe phenotypes observed in the developing embryo (O’Carroll et al. 2001; Pasini et al. 2007; Shen et al. 2008; Chamberlain, Yee, and Magnuson 2008; G. Li et al. 2010; Landeira and Fisher 2011; Riising et al. 2014). From these results, few studies proposed that Polycomb occupancy could only be the reflection of an absence of transcriptional activity upon a given locus (Chamberlain, Yee, and Magnuson 2008; Riising et al. 2014) with relatively few functional consequences. Remarkably, the phenotype of PRC1 inactivation in mouse ES cells shows more dramatic effects. Ring1b depletion is quickly followed by cell death, morphological changes and upregulation of PRC1-repressed genes (Leeb and Wutz 2007; van der Stoop et al. 2008) with double Ring1 KO showing even more severe phenotype (Stock et al. 2007; Endoh et al. 2008). Therefore, a possible hypothesis is that the loss of PRC2 and H3K27me3 at bivalent domains is compensated by the continuing presence of non-canonical PRC1 and the maintenance of H2AK119ub preventing a dramatic upregulation of PRC2 targeted genes.
Interestingly, 2i medium culture of mouse ES cells lead to a genome-wide reorganisation of H3K27me3 coinciding with a decrease of the mark at bivalent promoters with no apparent transcriptional change of these genes (Marks et al. 2012). However the loss of H3K27me3 at bivalent CGIs has been shown to come with an increased enrichment in the body of their associated genes (Illingworth et al. 2016). PRC2 complex was further shown not to be responsible nor necessary for gene silencing in 2i medium (Galonska et al. 2015) compared to serum/LIF condition where mouse ES cells appeared to be more unstable upon Eed depletion. One possibility for the absence of transcriptional response in 2i medium, despite an acute loss of H3K27me3, is that Erk inhibition impedes Ser5 RNA-polII phosphorylation at bivalent promoters therefore precluding transcriptional initiation (Tee et al. 2014).
Finally, the mechanism of PRC2 recruitment onto chromatin is still an open question. Polycomb-like proteins (Pcl) have been reported to show high affinity for H3K36me3 histone mark through their Tudor domain as well as for methylated CPGs and have been proposed to be key factors in the recruitment of PRC2 complex to chromatin. Moreover, their activity in mouse ES cells has been associated with both repression of bivalent genes and repression of pluripotency genes upon early differentiation (Ballaré et al. 2012; Brien et al. 2012, 19; Hunkapiller et al. 2012; Cai et al. 2013). Besides, PRC2 complex have been shown to bind RNA with high affinity and many discoveries made in the field of long non-coding RNAs (lncRNAs) suggest that these transcripts may act as scaffolds to guide histone modifying complexes to their genomic targets thus proposing a possible model of RNA-mediated recruitment of PRC2 throughout the genome (J. Zhao et al. 2010; Brockdorff 2013; Kaneko et al. 2013; Kaneko, Son, et al. 2014; Kaneko, Bonasio, et al. 2014a; Brockdorff 2017).
Long non-coding RNAs & mouse ES cells
Description and characteristics
The improvement of high-throughput sequencing technology has greatly improved our knowledge about the transcriptional activity in coding and non-coding regions of mammalian genomes. For instance, it was estimated that more than 70% of the human genome might be actually transcribed in some conditions while only 1 to 2% of it is supposed to code for protein-related genes (Dinger et al. 2009; K. C. Wang and Chang 2011). An important portion of these non-coding transcription units have been shown to fall under the category of lncRNAs (The FANTOM Consortium et al. 2014).
As explicitly revealed in their denomination, lncRNAs are defined by two common properties: their length must be greater than 200nt to differentiate them from shorter non-coding RNAs (miRNAS, piRNAs) and their coding potential must not indicate the possible production of any or any functional protein. The latter criteria is a matter of debate since many lncRNAs have been shown to interact with ribosomal units, although this is not a real evidence for effective translation (G.-L. Chew et al. 2013; Ji et al. 2015; Carlevaro-Fita et al. 2016; Zeng, Fukunaga, and Hamada 2018), and that some, however predicted as non-coding, were responsible for the synthesis of small functional peptides (Nelson et al. 2016). Indeed, in the case where an open reading frame (ORF) is present within such a transcript, the determination of its non-coding potential is based on the short size of the ORF or its absence of conservation among a large panel of related species (M. F. Lin, Jungreis, and Kellis 2011; L. Wang et al. 2013). An inconvenient drawback of this arbitrary (size) and negative (non-coding) definition of lncRNAs is that they harbour a large diversity of molecules with distinct functional and mechanistic properties that is, however, inherent to their recent discovery (Ulitsky and Bartel 2013).
Numerous sub-categories of lncRNAs have been proposed based on different features such as their location relative to surrounding coding genes (intergenic, sense/antisense overlapping, divergent) or the chromatin context embedding their transcription start site (TSS) (promoter of a coding gene, enhancer, CTCF binding) but will probably need to be reconsidered based on functional or intrinsic properties of the lncRNAs themselves (Mattick and Rinn 2015). While most stable lncRNAs are transcribed by RNA-pol II, possess a 5’ cap, are polyadenylated and often spliced, some use different ways for 3’ maturation or are otherwise usually unstable and rapidly degraded transcripts (Wilusz, Freier, and Spector 2008; M. Guttman 2009; Lloret-Llinares et al. 2015; Zong et al. 2016).
The sequence conservation across species of these RNAs is on average much lower than for their coding counterparts (Ulitsky and Bartel 2013). It is still not clearly understood whether this suggests that the selection pressure applied on lncRNAs is based on different criteria than the one applied on mRNAs or if it simply reflects the insignificant functionality of a vast majority of them. Indeed, if the structures of lncRNAs are more important than their actual sequences, which might be suggested by the fact that in silico predictions of secondary structures for lncRNAs lead to complex and organised folding (Pegueroles and Gabaldón 2016), it is conceivable that their DNA sequence evolved rapidly under lower constrains.
The average level and tissue-specificity of lncRNAs expression also shows a distinguishable behaviour compared to coding genes. Indeed, the vast majority of lncRNAs are expressed at a lower level than mRNAs, possibly suggesting their specific expression in some sub-populations of cells within a given tissue or cell culture (Cabili et al. 2011; Pauli et al. 2012; Derrien et al. 2012). In agreement with this idea, the expression of lncRNAs is more strongly restricted to a specific tissue or cell type than it is for coding genes. Such specific features argue for a role of lncRNAs in the maintenance of cell identity or in highly specified functional biological tasks (S. J. Liu et al. 2017).
LncRNAs function in mouse ES cells
First of all, it’s worth mentioning that new ways of action are constantly proposed for lncRNAs. This is not surprising given the incredibly high diversity of transcripts grouped together in the lncRNA family. We will therefore focus on a limited subset of transcripts, chosen to be representative of the diverse functions fulfilled by the lncRNAs, and that have more specifically been shown to play a functional role in pluripotent cells. In particular, the three main classes of regulatory lncRNAs will be mentioned: regulating neighbouring gene(s) in cis, regulating in trans the transcription of distant genes in the nucleus, acting on regulation of gene expression in the cytoplasm.
Despite the very large number of lncRNAs found to be specifically expressed in mouse pluripotent cells, very few of them have been functionally characterized so far (M. Guttman 2009; Mitchell Guttman et al. 2010, 2011; Lv et al. 2015; Bergmann et al. 2015; Bogu et al. 2016).
Gm15055 is a typical cis-regulatory lncRNA (G.-Y. Liu et al. 2016). It is highly expressed in mouse ES cells where it was shown to be positively regulated by Oct4 through a cis-regulatory element. Gm15055 is located 50kb upstream of the Hoxa gene cluster towards which it was shown to recruit the PRC2 complex resulting in the maintenance of H3K27me3 deposition on Hoxa gene promoters and induce transcriptional repression. It was further shown by chromosome conformation capture experiments that Gm15055 locus directly contacts multiple sites of the Hoxa gene cluster in mouse ES cells thus facilitating the cis targeting of Gm15055 RNA to the Hoxa genes.
The lncRNA TUNA (for Tcl1 upstream neuron-associated lincRNA) was first identified in an RNAi screen in mouse ES cells (N. Lin et al. 2014). TUNA was shown to be critical for mouse ES cells self-renewal as well as neural differentiation. It was further reported to have a positive impact on reprogramming efficiency when overexpressed. Remarkably, TUNA was shown to contain a 200-nt long RNA sequence displaying a strong evolutionary conservation across vertebrates allowing for its interaction with three RNA-binding proteins: Ptbp1, hnRNP-K, and Ncl. TUNA RNA and its three partners were further demonstrated to colocalize at the promoters of Nanog, Sox2 and Fgf4, whereas the precise mechanism through which this complex would be recruited at such locus was not assessed. However, Sox2 RNAi experiment revealed that it shares with TUNA a large portion of misregulated genes upon their depletion in mouse ES cells. Given the fact that both genes are involved in neurogenesis and show a highly similar expression pattern along neurodevelopment, the authors suggested a close relationship between Sox2 and TUNA regulatory functions.
Panct1, as TUNA RNA, was first identified in a RNAi screen (Chakraborty et al. 2012) looking for lncRNAs whose expression would be necessary for preserving mouse ES cells self-renewal. It is a sense overlapping lncRNA included in the protein coding gene Tobf1. Interestingly, the depletion of Tobf1 in mouse ES cells also leads to the loss of pluripotency, but more surprisingly Panct1 and Tobf1 were shown to colocalize in the nuclear space of mouse ES cells (Chakraborty et al. 2017) and to specifically form discreet foci in early G1 phase. The DNA binding of Tobf1, shown to be dependent of Panct1, was mapped genome-wide and revealed to overlap significantly with Oct4 binding. Strikingly, mutating an octamer-like motif in Panct1 RNA strongly diminishes the strength of Tobf1, and to a lower extent of Oct4, localization and recruitment to their common targets proposing a regulatory role of Panct1/Tobf1 complex on the recruitment of Oct4 as specific regions in cell cycle-dependent manner.
Linc-RoR (regulator of reprogramming), despite acting at distance from its transcriptional site, represents a radically different class of lncRNAs compared to Panct1 and TUNA RNAs. Linc-RoR is a human-specific cytosolic lncRNA which was first identified as enhancing the reprogramming efficiency of iPSCs when overexpressed (Loewer et al. 2010). However, linc-RoR was later shown to act as a microRNA sponge, buffering the effect of miR-145 and repressing its negative impact on OCT4, NANOG and SOX2 levels (Y. Wang et al. 2013). Its expression was shown to be directly controlled by the pluripotency factors and necessary for human ES cells self-renewal thus creating a self-sustaining feedback loop of the pluripotency network.
Finally, lincU was very recently studied on the basis of its regulation by the pluripotency factor Nanog in mouse ES cells (Jiapaer et al. 2018). LincU was shown to be localized in the cytoplasm where it stabilizes Dusp9 protein, an ERK-specific phosphatase, preventing its ubiquitination and degradation. This effect logically results in the repression of the ERK1/2 signalling pathway activity. Hence, upon depletion of lincU, mouse ES cells self-renewal is severely impaired while its overexpression induces the ground state of pluripotency. Remarkably, lincU is evolutionary conserved in human genome, and its effect on self-renewal is also conserved in human ES cells.
Therefore, it has been clearly established that lncRNAs can display regulatory functions on self-renewal of pluripotent cells through diverse functions and mechanisms involving all the layers of gene expression regulation (transcriptional, post-transcriptional and at the protein level). Given the extremely low percentage of them that have been characterized so far, we can expect that many more of them will find a place in the complex network regulating pluripotency.
From CRISPR discovery to CRISPR activators
Although genome engineering with CRISPR/Cas9 system is now a routinely used technique to insert, delete or modify DNA sequences in living organisms, it took more than 20 years after its first discovery (Ishino et al. 1987) for this tool to be used as a genome engineering tool in eukaryotic cells. While studying the IAP enzyme in Escherichia coli, Ishino et al. cloned and sequenced a 1.7 kb chromosomal fragment containing the iap gene and noticed “An unusual structure […] in the 3′-end flanking region of iap […]. Five highly homologous sequences of 29 nucleotides were arranged as direct repeats with 32 nucleotides as spacing.” With the increase of sequenced prokaryotic genomes in the 90s, it appeared that a great portion of bacteria and archaea actually possess the same kind of short repeated sequences. They were frequently organized in clusters but always regularly separated by unique sequences of constant length. They were first called Short Regularly Spaced Repeats (SRSRs) (Mojica et al. 2000) before getting their actual name CRISPR for Clustered Regularly Interspaced Short Palindromic Repeats (Jansen et al. 2002). It was therefore shown that in most species those clusters were flanked on one side by a common “leader” sequence. The repeats and their leader sequences were shown to be conserved within a species, but different between species. Four unique CRISPR-associated (Cas) genes were identified always adjoining a CRISPR locus indicating that Cas genes and CRISPR loci might be functionally linked. It was also shown that those CRISPR arrays were transcribed (T.-H. Tang et al. 2002). A deep bioinformatics study (Haft et al. 2005) dug into uncharacterized genes in the neighbourhood of CRISPR loci and found many additional protein families strictly linked to CRISPR loci across multiple prokaryotic species. They showed that Cas genes number can be larger than previously expected with up to 20 different ones and can also be located between two clusters of repeat. CRISPR loci were later classified on the basis of the Cas proteins they contained and were grouped in three classes, with the class I and III harbouring many Cas proteins acting in complex in opposite to the Class II having few effector Cas proteins. A big step-forward was taken when close analyses focused on the unique spacer sequences separating the clustered repeats and revealed their extrachromosomal, phage or plasmid-associated origins (Pourcel, Salvignol, and Vergnaud 2005; Mojica et al. 2005; Bolotin et al. 2005). It was additionally shown that the presence of exogenous sequences from a given virus within the CRISPR loci was positively correlated with the ability of the prokaryote to resist the viral infection. It was consequently postulated that CRISPR arrays serve as stable immune memory platforms leading to functional defence against pathogens. In 2007 the link between viral infection, viral spacer sequences insertion in CRISPR arrays, Cas proteins effector functions and viral resistance was clearly established (Barrangou et al. 2007). It was later evidenced that CRISPR transcripts are processed in small RNAs (CRISPR RNAs, crRNAs) containing single spacers to guide by a base-pairing mechanism the Cas nuclease activity (Brouns et al. 2008). A type III CRISPR was shown to act on DNA rather than RNA as it was preventing plasmid conjugation in bacteria (Marraffini and Sontheimer 2008). The same year, the importance of the presence in the target sequence of the short protospacer-adjacent motifs (PAMs) for Cas9-mediated cleavage of DNA was demonstrated (Deveau et al. 2008). However, the proper demonstration of Cas9 as being the only enzyme within its Cas genes cluster able to cleave DNA was done in 2010 (Garneau et al. 2010). A new component of the type II CRISPR systems was characterized the next year: the non-coding tracrRNA (trans-activating crRNA) was shown to be complementary of the repeated sequence of the CRISPR array and to hybridize with the crRNA to allow its maturation by endogenous RNase III and Cas9 protein. It was therefore demonstrated that Cas9 and two short non-coding transcripts only were required for targeted DNA cleavage. The route for targeted genome editing was opened.
It was first shown that the type II CRISPR system and subsequent interference function was transferrable from one bacterial strain to another (Sapranauskas et al. 2011). Soon after, purified Cas9 guided by crRNAs was shown to be able to cleave target DNA in vitro (Gasiunas et al. 2012; Jinek et al. 2012). A chimeric non-coding RNA resulting from a pseudo-fusion of the crRNA and the tracrRNA (called single guide RNA, sgRNA) was then engineered and shown to reproduce the function of the cr and tracrRNA duo in vitro. Finally, two studies reported simultaneously for the first time targeted genome editing in mammalian cells with type II CRISPR Cas9 (Cong et al. 2013; Mali, Yang, et al. 2013). Non Homologous End Joining (NHEJ) or Homology Directed Repair (HDR) mediated genome modification was obtained by heterologous expression of Cas9 and a sgRNA or a crRNA/tracrRNA hybrid allowing modification of a single gene as well as multiple genes at once in different human cell lines. After these pioneer studies, the use of CRISPR/Cas9 system as well as new Cas proteins exploded rapidly to become a widely used technology in numerous different organisms.
Summarized way of action
As type II CRISPR/Cas9 has been the most used and characterized CRISPR system so far, we will only focus on its particular properties. The full response of CRISPR/Cas9 system after phage or plasmid invasion is commonly divided in three steps. First, Cas1/2 complex is involved in the cleavage of the exogenous DNA in short sequences and their insertion within the CRISPR array between the crRNA-associated repeats. Those small fragments are not randomly picked from the foreign DNA but selected to be followed by a few nucleotides-long motif (Protospacer Adjacent Motif or PAM) that is necessary for further cleavage by Cas9 protein. Second, the newly modified CRISPR array is transcribed in a long non-functional fusion of crRNA and further matured by cleavage in mature crRNAs by endogenous RNase III. Repeated portion of the crRNA allows for hybridization with the trans-activating CRISPR RNA (tracrRNA) and allows the resulting pair of transcripts to interact with Cas9 protein. Third, Cas9 protein recognizes its specific target through base pairing with the spacer sequence of the crRNA and only if followed by the required PAM sequence subsequently cleaves the invading DNA molecule (Hille and Charpentier 2016; F. Jiang and Doudna 2017).
Table of contents :
I. Early development & embryonic stem cells
A. Early mouse embryo development
B. The establishment of pluripotency in vitro
C. Origin and properties
D. The spectrum of pluripotent cells
E. The maintenance of the pluripotent state
II. Signalling pathways regulating pluripotency
A. LIF signalling
B. FGF signalling
C. Wnt signalling
III. Transcription Factors-based regulation of pluripotency
A. Pluripotency factors
B. Oct4 (Pou5f1)
E. LIF independent self-renewal
IV. Polycomb regulation and bivalent domains in pluripotency
A. Polycomb complexes and functions
B. Bivalent domains and Polycomb regulation in pluripotent cells
V. Long non-coding RNAs & mouse ES cells
A. Description and characteristics
B. LncRNAs function in mouse ES cells
VI. From CRISPR discovery to CRISPR activators
A. Historical background
B. Summarized way of action
C. Genome engineering
D. Other kind of versatile DNA-binding protein
E. Transcriptional modulation
F. Examples of CRISPRa studies in stem cell biology and cell reprogramming
A. sgRNA cloning
B. Bio-informatics analysis with Seqmonk program (LASER selection)
C. Cell fractionation
D. Single cell sorting
E. Generation of LASER 23 KO ES cells
VII. Adaptation of the CRISPRa SunTag system in mouse ES cells
A. Why developing CRISPR-activation?
B. Construction of the first SunTag cell line generation
C. Construction of the second SunTag ES cell line generation
D. D. Transcriptional Induction tests
VIII. The molecular logic of Nanog-induced self-renewal
IX. LASER: LncRNAs Associated with SElf-Renewal of mouse ES cells
B. Preliminary selection of lncRNAs candidates
C. Characterization of the 24 LASER
D. LASER gRNAs design and test
E. LASER overexpression upon LIF withdrawal
F. Transcriptomic response upon three LASER overexpression
G. Additional lncRNAs candidates selection
H. LASER 23 (Gm14820) characterization
X. A serendipity-driven approach
B. An unexpected 2 cell-like state induction
C. First hypothesis: a LASER 1-mediated effect
D. Second hypothesis: an off-target effect
E. Identification of candidate genes