EM algorithm for Differential-SHAPE assignment

Get Complete Project Material File(s) Now! »

Bioinformatics concepts and tools

RNA 2D Bioinformatics

RNA secondary structure

An RNA can be abstracted as a succession of building blocks called nucleotides. In vivo or in vitro, an RNA folds in a complex way leading to the adoption of a specific tertiary structure responsible for a specific activity of the RNA molecule. This tertiary structure is mainly mediated by hydrogen bonds, denoted by base-pairs, between compatible nucleotides: pairs involving Adenine (A) and Uracil (U), or Cytosine (C) and Guanine (G), are known as Watson-Crick pairing, while a pair of G and U is know as Wobble pairing. Within most computational approaches, an RNA molecule is characterized by a linear structure (the sequence), and one or several secondary structure(s). In the absence of a conventional definition, a secondary structure for a given RNA molecule is considered as a planar projection of the tertiary structure, subject to further restrictions described below.
Let S be a sequence of bases of length n with S = b1, b2, …, bn where the ith base is noted bi with bi = A, U, C or G sequence of nucleotides. A secondary structure is defined as a list of base pairs (i, j), denoting the pairing of positions i and j, formed by complementary bases and verifying i < j. Positions that are not involved in any base-pairs are considered as being unpaired. For computational reasons [Lyngsø and Pedersen, 2000; Sheikh et al., 2012], existing computational approaches further restrict the secondary structure by enforcing the following constraints:
• Exclusivity condition: A nucleotide can form base pairs with at most one base. Thus if (i1, j1), (i2, j2) are two pairs, one has i1 6=i2 and j1 6=j2;
• Non-crossing condition: Structures that contains two base pairs (i1, j1) and (i2, j2) whith i1 < i2 < j1 < j2 as illustrated in Figure 2.2, are called pseudo-knotted, and are not considered by the classic prediction approaches.
RNA secondary structure is predicted without accounting for tertiary base pairs or for pseudo-knots; those restrictions are usually considered later on to model the tertiary structure. In the absence of crossing base pairs, each pair
(i, j) subdivides the structure into two separate parts: In between the pair (i + 1, j − 1) and the exterior region including (1, i − 1) and (j + 1, n) . Therefore, the two substructures could be treated separately as follows:
This consideration was behind the development of a recursive decomposition scheme that formed the basis of all Dynamic Programming (DP) approaches to resolve the RNA secondary structure [Waterman and Smith, 1978]. DP is a powerful technique that allows to find the optimal solution for a given problem by combining sub-solutions for sub-problems. It can be expressed as a recursion or more expressively as overlapping sub-problems. The recursive decomposition scheme was first used to count the number of compatible structures for a certain sequence. Indeed, let Ni,j is the number of possible structures in the sequence range [i, j],

Computational methods for 2D structure prediction

The questions around which computational approaches were developed concern the prediction of one or several conformations from an RNA sequence, potentially supplemented with additional experimental data. For the sake of simplicity, we will illustrate the principles underlying the main prediction paradigms on a simple base pair based model akin to the one used in the work of Nussinov et al. [1978], using the unambiguous decomposition scheme of Waterman and Smith [1978] to allow for a computation of the partition function (and derived quantities).
Energy minimization. A first category of approaches considers the minimal free-energy (MFE) structure, the most stable conformation a given RNA se-quence may adopt with respect to a given energy model. Indeed, Nussinov et al. [1978] developed the first algorithm dedicated to predict the MFE structure: a DP algorithm that returns an optimal structure as the one with the maximal number of base pairs, by a backtracking procedure.

Table of contents :

Part I
Chapter 1 { Introduction
1.1 Generalities on RNA structure and probing
1.2 Towards increasing the accuracy of predicted RNA structures with the use of probing data
Chapter 2 { Bioinformatics concepts and tools
2.1 RNA 2D Bioinformatics
2.2 Wet-lab experiment for structure modeling
2.3 Accuracy assessment tools
Chapter 3 { Probing data integrative modeling
3.1 Modeling challenges
3.2 The evolution of probing data integrative methods
3.3 Probing data and evolutionary covariation
Part II
Chapter 4 { Probing data analysis
4.1 Capillary electrophoresis data
4.2 High-throughput data
4.3 Conclusion
Chapter 5 { Integrative Probing Analysis of Nucleic Acids Em- powered by Multiple Accessibility Profiles
5.1 Towards a multi-probing integrative approach
5.2 Sampling and Clustering
Chapter 6 { EM algorithm for Differential-SHAPE assignment
6.1 The assignment problem statement
6.2 The EM parameters estimation
6.3 The EM assignment algorithm
Part III 85
Chapter 7 { Validating the predictive capacity of IPANEMAP
7.1 Validating IPANEMAP
7.2 Benchmark on simulated probing data
7.3 Comparison of IPANEMAP with other tools
Chapter 8 { Applications
8.1 HIV-1 Gag-IRES
8.2 GIR1 Lariat-capping ribozyme
8.3 Ebola UTRs
Chapter 9 { Discussion and perspectives
9.1 Contributions
9.2 Discussion
9.3 Conclusion