Integrative Probing Analysis of Nucleic Acids Em- powered by Multiple Accessibility Profiles

Get Complete Project Material File(s) Now! »

Generalities on RNA structure and probing

Classic methods to observe the structure of RNA at high resolution include X-ray crystallographic analysis [Golden, 2007], and Nuclear Magnetic Resonance (NMR), which have shown to be useful to reveal the tertiary structure of viral RNAs and riboswitches [Houck-Loomis et al., 2011]. Despite the eectiveness of these experimental approaches, many RNA structures are still not resolved yet, due to the prohibitive cost of experimental methods, along with their limited lifespan and stability. Consequently, wet-lab methods are complemented by the development of computational approaches and recently by the design of dedicated biochemical protocols.
Computational approaches In silico, the secondary structure can be compu- tationally predicted at the thermodynamic equilibrium, using an energy model called Turner model [SantaLucia and Turner, 1997] that allows to assign any given structure a numerical value called its free-energy. The global free energy value for a given secondary structure is typically calculated as the sum of the partial free energies of its small recognizable structural domains that include hairpin, loops, bulges, and internal loops. When the RNA reaches the thermodynamic equilib- rium, the thermodynamic potential induces a Boltzmann distribution based on the free-energy, where the most probable conformation is the one of lowest free-energy. Thus, RNA in silico structure prediction aims to report the Minimum Free Energy (MFE) structure. The prediction of the MFE structure can be performed using a variety of available dynamic programming algorithms [Zuker and Stiegler, 1981].
The most prominent advantage of this approach lies in its ability to accurately predict structures for RNA sequences of length below 700nts with a sensitivity of about 73% [Mathews, 2004] in a matter of seconds on a personal computer. In vivo, an RNA may adopt alternative functional conformations. However, a major drawback of MFE-based modeling, that predicts the most stable structure at the Boltzmann equilibrium, resides is its inability to capture the structural diversity that may be required for the function of some RNAs. Conserved alter- native structures are featured within RNAs associated with switching behaviors, and are increasingly considered by kinetics studies, as transient structures adopted by nascent transcripts can be crucial to channel the folding towards the correct energy basin.

Towards increasing the accuracy of predicted RNA structures with the use of probing data

Probing data (chemical or enzymatic) present a non negligible source of structural information. However, the inference of the structure from such data is rather del- icate and sensitive to the experimental noise and the computational calibration. In the case of enzymatic probing, data-guided predictions only consider struc- tures verifying reactivity constraints. This can lead to wrong predictions in the case of missed experiment. Moreover, reactivity proles represent an averaged signal, and may sometimes be impacted by the existence of more than one single conformation. Finally, the exponential increase of probing data due to the use of High-throughput sequencing, prompted the development of new approaches. While allowing for a better interpretation of the reactivity proles and a boost in the predicted structures accuracies, those experimental protocols reveal dierent structural features, some targeting unpaired positions while some other inform about nucleotides involved in a double strand. For those reasons, there is a need to develop an integrative multi-probing data modeling. A rst step towards such an integrative method requires the automation of probing data processing. Such analyses represent a recurrent area of interest where the most pressing question is: how to design probing data analysis pipelines that provide a faithful picture of the observed phenomenon? Multiple processing steps are unavoidably required to obtain reactivities further used to guide the structure predictions. Generally, the processing of probing data starts with the collection of raw signals, the response to a chemical/enzymatic reaction. These raw signals might be subject to accumulated noise due, among other reasons, to the experimental setup, the sequencing errors and the prole recovery method. A small change in the reactivity prole would have a direct consequence on the predicted structural ensemble. This sensitivity makes the processing of the probing data one of the interesting point addressed in this thesis. The processing of probing data produced through HTS can be decomposed into three steps: Firstly, the mapping of sequenced reads, or transcription stops, onto the RNA of reference; Secondly, the calculation of reactivities that quantify the response at nucleotide level to a specic experimental reaction in function of the structural context (Paired/Un- paired). The last step, that remains the most dicult to establish, concerns the conversion of the reactivity values into pseudo-energy contributions to drive the structure prediction.
The points (including both faced issues and contributions) addressed in this thesis can be summarized as:
1. An automation of NGS probing data processing: from mapping of reads to the construction of reactivity proles;
2. A new mapping algorithm based on the use of mutational proles in the case of a simultaneous sequencing of RNA mutants SHAPEMap modied;
3. A new integrative approach that both exploits the coherence aspect between dierent probing data sources, and also takes into account the multiple con- formations adopted by RNA(s);
4. An extension of the developed integrative approach to study the agreement between reactivity proles from RNA mutants under the assumption of the conservation of the functional structure.

Computational methods for 2D structure prediction

The questions around which computational approaches were developed concern the prediction of one or several conformations from an RNA sequence, potentially supplemented with additional experimental data. For the sake of simplicity, we will illustrate the principles underlying the main prediction paradigms on a simple base pair based model akin to the one used in the work of Nussinov et al. [1978], using the unambiguous decomposition scheme of Waterman and Smith [1978] to allow for a computation of the partition function (and derived quantities).

Table of contents :

Part I
Chapter 1 { Introduction
1.1 Generalities on RNA structure and probing
1.2 Towards increasing the accuracy of predicted RNA structures with the use of probing data
Chapter 2 { Bioinformatics concepts and tools
2.1 RNA 2D Bioinformatics
2.2 Wet-lab experiment for structure modeling
2.3 Accuracy assessment tools
Chapter 3 { Probing data integrative modeling
3.1 Modeling challenges
3.2 The evolution of probing data integrative methods
3.3 Probing data and evolutionary covariation
Part II
Chapter 4 { Probing data analysis
4.1 Capillary electrophoresis data
4.2 High-throughput data
4.3 Conclusion
Chapter 5 { Integrative Probing Analysis of Nucleic Acids Em- powered by Multiple Accessibility Profiles
5.1 Towards a multi-probing integrative approach
5.2 Sampling and Clustering
Chapter 6 { EM algorithm for Differential-SHAPE assignment
6.1 The assignment problem statement
6.2 The EM parameters estimation
6.3 The EM assignment algorithm
Part III
Chapter 7 { Validating the predictive capacity of IPANEMAP
7.1 Validating IPANEMAP
7.2 Benchmark on simulated probing data
7.3 Comparison of IPANEMAP with other tools
Chapter 8 { Applications
8.1 HIV-1 Gag-IRES
8.2 GIR1 Lariat-capping ribozyme
8.3 Ebola UTRs
Chapter 9 { Discussion and perspectives
9.1 Contributions
9.2 Discussion
9.3 Conclusion .