Splicing: from molecular mechanisms to personalized therapies

Get Complete Project Material File(s) Now! »

Molecular mechanisms resulting in the expression of tran-

script isoforms

In this section we describe the molecular mechanisms behind splicing and resulting in the ex-pression of several transcript isoforms from the same locus. We do not claim that the following explanations would satisfy the curiosity of a molecular biologist, but we hope they can benefit non-specialists by introducing some key concepts. In particular, we do not detail the diﬀerent proteins known to be involved in the splicing machinery and their mechanisms of action, but we rather give a schematic view of their eﬀects and refer to the literature for more detailed explanations of molecular mechanisms.

A bit of history: pre-mRNA splicing

The gene expression field made an important step forward in the late 80’s when the split nature of most eukaryotic genes was discovered. In 1977, several groups working with adenoviruses that infect and replicate in mammalian cells obtained surprising results: RNA molecules from infected cells containing sequences from non-contiguous sites in the viral genome (Berget et al., 1977; Chow et al., 1977). What they termed “mosaic RNA” at the time was the result of the excision of what came to be called intragenic sequences (introns) from precursor mRNA. This process of removing or “splicing out” introns is now known as precursor mRNA splicing (pre-mRNA splicing or splicing in short form). However, the concept of pre-mRNA is nowadays thought to be a virtual entity due to the co-transcriptional nature of splicing (Merkhofer et al., 2014).
Formally, an intron is defined as a gene segment that is present in the primary (or precursor) transcript but absent from the mature RNA as a consequence of splicing. The term intron refers to both the DNA sequence within a gene and the corresponding sequence in the unprocessed RNA transcript. On the contrary, an exon denotes a gene segment that is or can be present in mature RNA. Most human genes contain multiple exons, and the average length of exons (50 250bp1) is much shorter than that of introns (frequently thousands of bp). Figure 2.1 illustrates the split nature of eukaryotic genes: figure 2.1(a) shows the exons and introns of a gene as well as the untranslated regions (UTRs), the initiation codon and the termination codon at the 5’ and 3’ ends of the first and last exons. It also depicts a promoter region that contributes to define the transcription inition site and a polyadenylation (polyA) addition sequence signal that contributes to define the polyA addition site. The polyA addition site delineates the transcription termination site. Figure 2.1(b) shows the pre-mRNA that results from transcription, 5’ capping (i.e. the addition of a methylated guanine at the 5’ end of the pre-mRNA) and polyA addition. Finally figure 2.1(c) corresponds to the mature mRNA resulting from pre-mRNA splicing.

How splicing happens?

The biochemical mechanism by which splicing occurs is fairly well understood (Clancy, 2008). Introns are removed from primary transcripts by cleavage at conserved sequences called splice sites. These sites are found at the 5’ end (donor site) and 3’ end (acceptor site) of introns. The splice donor site includes an almost invariant sequence GU within a larger and less highly conserved region while the splice acceptor site terminates the intron with an almost invariant AG sequence. These consensus sequences are known to be critical, as changing one of the conserved nucleotides often results in the inhibition of splicing (Cartegni et al., 2002). Another important sequence occurs at what is called the branch point, characterized by an A residue, and located anywhere from 18 to 40 nucleotides upstream from the 3’ end of an intron.

Alternative splicing and alternative transcription

How come there are ⇠ 120000 mRNA molecules mapped out in the human cells while the human genome contains only ⇠ 25000 protein-coding genes? The solution lies in the alternative nature of splicing in eukaryotes.
Alternative splicing is the mechanism through which multiple mature mRNA transcripts (or mRNA isoforms) are expressed from a single gene. The ability of cells to exhibit variations of mature mRNA from the same pre-mRNA adds a layer of complexity to the central dogma DNA ! RNA ! protein of molecular biology. It is accomplished by excluding one or more exons (exon skipping), by moving exon/intron boundaries (acceptor or donor splice site shift) or by retention of introns. The main modes of alternative splicing are illustrated in figures 2.3(b), 2.3(c), 2.3(d), 2.3(e), 2.3(f). This widespread mechanism is estimated to aﬀect ⇠ 90% of mammalian protein-coding genes (Wang et al., 2008a) and is now considered a fundamental regulatory process at the crossroad between transcription and translation. Some functional aspects of alternative splicing are discussed in section 2.2.
Perhaps the most striking example of alternative splicing comes from Drosophila melanogaster. Its Dscam gene, which codes for a cell surface protein involved in neuronal connectivity, has 24 exons, with 12 alternative versions of exon 4, 48 versions of exon 6, 33 versions of exon 9 and 2 versions of exon 17. Each version of a particular exon is used to the exclusion of all the others. Thus the combinatorial use of alternative exons can potentially generate 38016 diﬀerent protein isoforms (Schmucker et al., 2000). The Dscam gene exemplifies both the extreme expansion in coding capacity that alternative splicing provides and the tight regulation of alternative splicing that must be in place to somehow enforce mutual exclusion of the diﬀerent versions of the exons.
In addition to the alternative splicing mechanisms mentioned above and illustrated in figure 2.3 (exon skipping, alternative acceptor or donor splice sites and intron retention), the exon com-position of RNA transcripts can also vary by the diﬀerential selection of 5’ end transcription initiation and 3’ end termination sites – also known as multiple promoter or multiple polyA usage (Kornblihtt, 2005). Figures 2.3(g) and 2.3(h) illlustrate as well these two distinct mecha-nisms which are not splicing events stricto sensu but similarly participate to creating a variety of RNA transcripts from a single locus.

What makes splicing alternative?

The decision as to which exon is removed and which exon is included involves RNA sequence elements and protein regulators.
First of all, splice sites can be strong or weak depending on how far their sequences diverge from the consensus sequences, which determine their aﬃnity for splicing factors. The relative position and use of weak and strong sites give rise to the diﬀerent alternative splicing modes described in figure 2.3. Unsurprisingly, it has been shown that alternative exons possess weaker splice sites than constitutive exons (Sorek et al., 2004).
Second, the degree to which weak sites are used is regulated by both cis-regulatory sequences and trans-acting factors. Depending on the position and function of the cis-regulatory elements, they are divided into four categories: exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs) and intronic splicing silencers (ISSs). Trans-acting factors include proteins and ribonucleoproteins that bind to the splicing enhancers and silencers. Figure 2.4 shows how these enhancers and silencers act combinatorially to regulate the alterna-tive use of splice sites. Of note, a machine learning algorithm has been developed that is capable of automatically extracting combinations of cis-elements that are accurately predictive of brain, muscle, digestive and embryo versus adult specific alternative splicing patterns (Barash et al., 2010).
Finally, alternative splicing is also believed to be regulated by the secondary structure of the pre-mRNA transcript and by interactions with the transcription and chromatin machiner-ies (Schwartz and Ast, 2010; Luco et al., 2011).
For accurate reviews of alternative splicing mechanisms and regulation we suggest Matlin et al. (2005), Chen and Manley (2009) and Kornblihtt et al. (2013).
In line with what has been presented above, chapter 6 focuses on detecting splicing defects on transcripts expressed from alleles harboring mutations in their cis-regulatory splicing enhancers or silencers.

Table of contents :

1 Preambule
2 Splicing: from molecular mechanisms to personalized therapies
2.1 Molecular mechanisms resulting in the expression of transcript isoforms
2.1.1 A bit of history: pre-mRNA splicing
2.1.2 Alternative splicing and alternative transcription
2.1.3 What makes splicing alternative?
2.2 Some aspects of the functional importance of alternative transcript expression
2.2.1 A word of evolution
2.2.2 Alternative splicing regulation during development and cell fate decision
2.2.3 Coupling of alternative splicing with nonsense-mediated decay
2.3 Splicing dysregulation in human diseases
2.3.1 Mutated regulatory sequences
2.3.2 Trans-acting factors
2.3.3 A focus on cancer
2.4 Emerging therapies targeting splicing defects
2.4.1 Cancer-specific isoforms as biomarkers
2.4.2 Splice modulating therapies
2.4.3 Antisense oligonucleotides: the example of Duchenne muscular dystrophy
3 Questioning splicing: from data to algorithms
3.1 Measuring splicing with data evolving in time
3.1.1 Heritage of Sanger sequencing
3.1.2 Successes and limitations of microarray splicing profiling
3.1.3 High-throughput sequencing of the RNA as the new gold standard
3.2 Computational challenges associated with RNA-seq reads
3.2.1 Mapping RNA-seq reads
3.2.2 Modeling RNA-seq reads
3.2.3 The isoform deconvolution problem
3.3 Genome-guided transcript estimation
3.3.1 Inferring transcripts with various techniques
3.3.2 `1-norm penalization
3.3.3 Network flow optimization
4 Efficient transcript isoform identification and quantification from RNA-seq data with network flows
4.1 Background and related works
4.2 Proposed approach
4.2.1 Statistical model
4.2.2 Isoform detection by sparse estimation
4.2.3 Isoform detection as a path selection problem
4.2.4 Optimization with network flows
4.2.5 Flow decomposition
4.2.6 Model selection
4.3 Experimental validation
4.3.1 Simulated human RNA-seq data
4.3.2 Real RNA-Seq data
4.4 Conclusion
5 A convex formulation for joint transcript isoform estimation from multiple RNA-seq samples
5.1 Background and related works
5.2 Proposed approach
5.2.1 Multi-dimensional splicing graph
5.2.2 Joint sparse estimation
5.2.3 Candidate isoforms
5.2.4 Model selection
5.3 Experimental validation
5.3.1 Influence of coverage and sample number
5.3.2 Influence of hyper-parameters with realistic simulations
5.3.3 Experiments with real data
5.3.4 Illustrative examples
5.4 Conclusion
6 A time- and cost-effective clinical diagnosis tool to quantify abnormal splicing from targeted single-gene RNA-seq
6.1 Background
6.1.1 Molecular diagnosis context
6.1.2 Targeted single-gene RNA-seq
6.2 Results and discussion
6.2.1 A pipeline to query splicing abnormalities
6.2.2 BRCA1 pilot study
6.2.3 Data normalization
6.2.4 Quantifying splicing events on controls
6.2.5 Detecting abnormal events as deviation from control distributions
6.2.6 Deciphering complex splicing events with full-length transcript prediction
6.3 Conclusion
6.4 Methods
6.4.1 RNA isolation and sequencing
6.4.2 Bioinformatics pre-processing
6.4.3 Data normalization
6.4.4 Transcript prediction
7 Discussion
A Supplementary figures
B Supplementary tables
C Software
Bibliography