Basic Introduction toMolecular Biology and Gene Expression

Get Complete Project Material File(s) Now! »

Modélisation et simulation basées sur des agents d’interactions biomoléculaires

Dans la seconde partie de ce manuscrit, nous proposons un simulateur multi-agents développé pour étudier les interactions moléculaires caractérisant les voies métaboliques, et analyser ses propriétés globales à partir des interactions locales [13, 69]. Nous sommes capables de simuler des réactions enzymatiques complètes en modélisant les molécules impliquées (en-zymes, métabolites et complexes) comme des agents autonomes et interactifs.
Nous explorons les capacités du simulateur fourni pour traiter les interactions électrodynamiques à longue distance qui façonnent le comportement des systèmes bimoléculaires, et analysons leur effet sur l’évolution d’une voie métabolique, telle que la glycolyse de la levure. Cette enquête a été menée dans le cadre de notre collaboration avec le Centre de Physique Théorique de l’Université Aix-Marseille.
Des études in vitro ont montré qu’une charge oscillant à haute fréquence (de l’ordre de 1010 ¡1011 Hz) ne subit pas l’effet de criblage Debye par les ions du milieu et une macromolécule biologique se comporte comme une dipôle; des forces à longue portée peuvent être activées entre deux systèmes dipolaires résonants [31, 62].
Notre objectif est de fournir une validation in silico à ces expériences. Chaque molécule est représentée par un agent capable de percevoir l’environnement et les partenaires apparentés avec lesquels elle peut interagir. Une approche similaire peut également être adoptée en définis-sant un modèle de dynamique moléculaire; cependant, ce type de méthode place l’analyse à un niveau atomistique et les simulations associées ont une charge de calcul élevée. La composi-tionnalité des modèles basés sur des agents, au contraire, permet de mener l’étude à un niveau macromoléculaire, sans perdre en précision et en effectuant des simulations légères.
Cependant, comprendre et représenter dans son ensemble la dynamique des agents caractérisant une réaction métabolique réalisée par un grand nombre de molécules constitue toujours un enjeu majeur.
Pour cette raison, nous définissons également un nouveau paradigme de visualisation basé sur le concept de interaction-as-perception: chaque fois qu’une molécule en perçoit une autre avec laquelle interagir, un lien potentiel entre les deux est établi. De cette manière, nous pou-vons dériver le graphique des perceptions à une étape donnée; sur ces graphiques, nous ap-pliquons l’analyse des données topologiques pour capturer les interactions à 3 corps à travers l’interprétation des 2-simplices comme des structures observables, qui sont des coques convexes de trois points. Nous utilisons la formation 2-simplex comme sémantique valide pour représenter la dynamique globale du système.
Néanmoins, les processus biologiques sont des systèmes complexes dont le comportement global n’est pas toujours possible de prédire, en raison de l’incomplétude des données ob-servées.Pour incorporer cette propriété dans un modèle à base d’agents d’un système biologique, les interactions des agents doivent avoir un caractère aléatoire ou l’environnement de simula-tion doit être non prévisible (cela implique que chaque exécution de la simulation est affectée par l’incertitude statistique). D’autres étapes sont nécessaires pour fournir une spécification efficace de l’environnement, espérons-le en faisant référence à la modélisation de calcul interac-tif [55].

ORGANISATION DU MANUSCRIT 

Organisation du manuscrit

Chaque partie de ce manuscrit est corrélée à un premier chapitre d’introduction (chapitres 2 et 6), qui décrit les concepts biologiques de base nécessaires pour mieux comprendre nos études et les approches de modélisation que nous avons adoptées pour atteindre les résultats fournis.
La première partie consacrée à nos résultats est composée des chapitres 3, 4 et 5. Plus précisé-ment:
– au chapitre 3, nous fournissons les modèles algébriques du repliement de l’ARN et des pro-téines et prouvons comment il est possible de définir formellement un niveau d’abstraction dans lequel de tels processus montrent une équivalence comportementale (niveau de con-gruence). Sa définition nous a permis d’émettre des hypothèses sur certaines des raisons qui conduisent l’évolution de la vie à la formation de protéines et de les assumer comme les principaux catalyseurs des processus biologiques.
– Le chapitre 4 analyse une classe de pathologies qui affecte les processus de repliement pour étudier comment les différences entre les composants structuraux des protéines et des ARN provoquent une réponse différente à une altération de la voie de repliement correcte.
– Au chapitre 5, nous explorons l’expressivité des algèbres de processus dans la modélisation des fonctions représentant le comportement des molécules d’ARN non codantes, suite à la caractérisation du niveau de congruence défini au chapitre 3. Sur la base de ces résultats, nous proposons une méthodologie adaptée pour générer une spécification algébrique d’une simulation multi-agents.
La deuxième partie du corps de ce manuscrit comprend les chapitres 7 et 8:
– au chapitre 7, nous décrivons un environnement de simulation dédié à l’étude des inter-actions moléculaires à longue distance dans les réactions métaboliques; nous proposons une approche à plusieurs corps, implémentée comme un système multi-agents (MAS).
– Le chapitre 8 fait progresser le chapitre précédent en utilisant la simulation MAS pour générer la dynamique du système biologique complexe afin de visualiser et de compren-dre le comportement global de ce système; cela est possible grâce à l’introduction du paradigme de l’interaction comme perception.
Le chapitre 9 conclut le manuscrit et fournit nos réflexions sur les résultats obtenus, les limites rencontrées au cours des études et les améliorations possibles à prendre en compte pour les travaux futurs.
This chapter is intended to provide to the reader the basic concepts, biological and theoretical, needed to comprehend the models described in the Part I of this manuscript.
The first section gives an overview on the processes at the basis of protein folding and gene expression; we also introduce the RNA World hypothesis, addressed in Chapter 3. Finally, we briefly describe haemoglobin, a protein that we will analyse in Chapter 4 to model the behaviour of the sickle-cell anemia.
In the second section we provide the basic formalism at the basis of our modelling approaches; in particular, we will define CCS process algebra, Labeled Transition Systems and Hennessy-Milner logic; we also introduce the concept of agent, partly exploited in Chapter 5, even if we will deepen the the agent-based modelling and simulation in second part of this manuscript.
This chapter do not introduce any original content, except for section 2.2.4, where we propose an overview of our modelling approach.

Basic Introduction to Molecular Biology and Gene Expression

A molecule of deoxyribonucleic acid (DNA) consists of two strands of nucleotides, that is com-pounds made by a sugar-phosphate group covalently linked to a nucleobase (or just base).
Only the base differs in each nucleotide and can be one of four possible types: Adenine (A), Guanine (G), Cytosine (C) or Thymine (T). Adenine and Guanine are two-rings bases (purines), while Cytosine and Thymine are single-ring bases (pyrimidines).
The two nucleotide strands of a DNA molecule are held together by hydrogen bonds, connecting the bases of one strand to those of the other. An Adenine always pairs with a Thymine, and a Guanine always pairs with a Cytosine (that is, a purine always pairs with a pyrimidine). As a consequence of this complementary base-pairing, each strand of a DNA molecule contains a sequence of nucleotides that is exactly complementary to the sequence of the other strand. DNA strands run antiparallel to each other (i.e. are oriented in opposite polarities), twisted into a double helix.
The possibility of base-pairing nucleotides, also allow the DNA strands to be used as templates for generating a completely new DNA molecule in a process called DNA replication. This, as many other processes functions in cells, is performed by an enzyme, a molecule – in this a case protein – that acts as catalyst and helps complex reactions to occur. The replication process is carried out by the DNA polymerase enzyme and starts from a defined sequence of nucleotides, the replication origins.
While the replication process proceeds, the DNA polymerase monitors and corrects possible errors in the base paring from the original to the new strand (proofreading). However, some errors can be left uncorrected, causing a so called mismatch, that is a mispaired nucleotide. For this reason, a specific complex of proteins has the function of mismatch repairing. If a replication mistake escapes this additional control, the new DNA strand will present a mutation, a permanent change of its sequence that can alter the gene expression.
Genes are specific sequences of nucleotides that contain the instructions for producing functional molecules, which can be either proteins or functional-RNAs. The process that converts the information encoded in the nucleotide sequence of a gene in the related functional product is defined as gene expression.
In this context, the roles of both intermediate and final product is performed by the RNA molecules.
The function of a protein is determined by its 3D structure, which is in turn determined by the sequence of its component molecules, the amino acids.
RNA is a linear molecule very similar to DNA, however it presents some differences. For the purposes of understanding the following chapters, it’s important to consider that:
• RNA is composed by the bases Adenine (A), Guanine (G), and Cytosine (C), like DNA, but it contains Uracil (U) instead of Thymine (T). However, a Uracil molecule behaves similarly to Thymine and can base-pair with an Adenine.
• An RNA molecule is single-stranded, meaning that it can fold on itself and form three-dimensional structures. As we will see better in the following sections, this property allows some type of RNA molecules to carry out complex functions in cells.
All of the RNA in a cell is made by transcription, a process carry out by enzymes called RNA polymerases. During transcription one of the two strands of the DNA double helix acts as a template for the synthesis of RNA, so that, the nucleotide sequence of the RNA chain is built according to the base-pairing with that template. The RNA chain produced by transcription is called the transcript and, because of complementarity, its sequence is equivalent to the sequence of the strand of DNA that doesn’t act as template.
The vast majority of genes carried in a cell’s DNA specify the amino acid sequence of proteins, and the RNA molecules that are copied from these genes (and that ultimately direct the synthesis of proteins) are collectively called messenger RNA (mRNA). In eukaryotes, each mRNA typically carries information transcribed from just one gene, coding for a single type of protein.
The final product of other genes, however, is the RNA itself. Important examples are:
• ribosomal RNA (rRNA), which forms the core of the ribosomes, on which mRNA is trans-lated into protein;
• transfer RNA (tRNA), which forms the adaptors that select amino acids and hold them in place on a ribosome for their incorporation into protein;
• microRNAs (miRNAs), which serve as key regulators of eukaryotic gene expression.

READ  The influence of topography in remote sensing and in the monitoring of forests and environmental services throughout the time

Start and stop signals

When an RNA polymerase collides randomly with a piece of DNA, it sticks weakly to the double helix and then slides rapidly along. The enzyme latches on tightly only after it has encountered a region called a promoter, which contains a specific sequence of nucleotides indicating the starting point for RNA synthesis. Chain elongation then continues until the enzyme encounters a second signal in the DNA, the terminator (or stop site), where the polymerase halts and releases both the DNA template and the newly made RNA chain. The promoter is asymmetrical and binds the polymerase in only one orientation; thus, once properly positioned on a promoter, the RNA polymerase has no option but to transcribe the appropriate DNA strand (see Figure 2.1). Because tight binding is required for RNA polymerase to begin transcription, a segment of DNA will be transcribed only if it is preceded by a promoter sequence. This ensures that only those parts of a DNA molecule that contain a gene will be transcribed into RNA.
Because, in eukaryotic cells, DNA is enclosed within the nucleus, transcription takes place in the nucleus itself, but protein synthesis takes place on ribosomes in the cytoplasm. So, before a eukaryotic mRNA can be translated, it must be transported out of the nucleus through small pores in the nuclear envelope. Before a eukaryotic RNA exits the nucleus, however, it must go through several different RNA processing steps.
Two processing steps that occur only on transcripts destined to become mRNA molecules are capping and polyadenylation; for what we are interested in this discussion, we’ll focus to a third step common to all kind of RNA, a process called RNA splicing.
Most eukaryotic genes have their coding sequences called exons (or expressed sequences) in-terrupted by noncoding intervening sequences, called introns. In the RNA splicing, intron sequences are removed from the newly synthesized RNA and exons are stitched together. Each intron contains a few short nucleotide sequences that act as cues for its removal. Guided by these sequences, an elaborate splicing machine (mainly composed by small nuclear RNAs or snRNAs) called spliceosome cuts out the intron sequence.
Many proteins are composed of a set of smaller protein domains. Some proteins are built from multiple copies of the same domain linked together in series. In eukaryotes, each protein domain is usually encoded by a separate exon

RNA Translation

After the transcription of a nucleotide sequence of DNA into an mRNA molecule, the latter undergo the translation process, which synthesises a new protein.
Proteins are polymers, that is, they are molecules containing many copies of a smaller building block, covalently linked. The building blocks of proteins are amino acids, of which there are 20 that occur regularly in the proteins of living organisms and that are specified by the genetic code (for further details on the structure of proteins, see 2.1.2 on page 20).
Because there are only 4 different types of nucleotides in mRNA but 20 different types of amino acids in a protein, this translation cannot be performed by a direct one-to-one correspondence between a nucleotide in RNA and an amino acid in protein. The rules by which the nucleotide sequence of a gene, through the medium of mRNA, is translated into the amino acid sequence of a protein are known as the genetic code. The sequence of nucleotides in the mRNA molecule is read consecutively in groups of three. Because RNA is a linear polymer made of four different type of nucleotides, there are thus 4 x 4 x 4 = 64 possible combinations of three nucleotides: AAA, AUA, AUG, and so on. However, only 20 different amino acids are commonly found in proteins, so the code is redundant and some amino acids are specified by more than one triplet. Each group of three consecutive nucleotides in RNA is called a codon, and each specifies one amino acid.
RNA sequence can be translated in any one of three different reading frames, depending on where the decoding process begins. However, only one of the three possible reading frames specifies a correct protein.
The codons in an mRNA molecule do not directly recognize and bind the amino acids they specify. Rather, the translation of mRNA into protein depends on adaptor molecules, called transfer RNAs (tRNAs), that can recognize and bind to a codon at one site on their surface (anticodon) and to an amino acid that matches the codon at another site. The anticodon is a set of three consecutive nucleotides that through base-pairing bind the complementary codon in an mRNA molecule.
The recognition of a codon by the anticodon on a tRNA molecule depends on the same type of complementary base-pairing used in DNA transcription. However, accurate and rapid translation of mRNA into protein requires a large molecular machine that moves along the mRNA, captures complementary tRNA molecules, holds them in position, and covalently links the amino acids that they carry so as to form a protein chain. This protein-manufacturing machine is the ribosome, which is a large complex made from more than 50 different proteins (the ribosomal proteins) and several RNA molecules called ribosomal RNAs (rRNAs).

Ribosomes

Ribosomes are composed of one large and one small subunit. The small subunit matches the tRNAs to the codons of the mRNA, while the large subunit catalyzes the formation of the peptide bonds that covalently link the amino acids together into a polypeptide chain. To begin the synthesis of a protein, the two subunits come together on an mRNA molecule, usually near its beginning (50 end). The mRNA is then pulled through the ribosome like a piece of tape (see Figure 2.3 on the facing page). As the mRNA moves through it, the ribosome translates the nucleotide sequence into an amino acid sequence one codon at a time, using the tRNAs as adaptors. The translation of an mRNA begins with the codon AUG. The end of the protein-coding message is signaled by the presence of one of several codons called stop codons. These special codons — UAA, UAG, and UGA — are not recognized by a tRNA and do not specify an amino acid, but instead signal to the ribosome to stop translation.

Protein Structure and Folding

Proteins are chains of amino acids, with each amino acid joined to its neighbour by a specific type of covalent bond, called peptide-bond. All 20 of the common amino acids have a carboxyl group and an amino group bonded to the same carbon atom (the alpha-carbon). They differ from each other in their side chains, or R groups, which vary in structure, size, and electric charge, and which influence the solubility of the amino acids in water.
The specific characteristics of an amino acid are determined by the properties of its R group. The polarity of the group, which correlates with its solubility in water, is one critical property. The polarity of the R groups varies widely, from non-polar and hydrophobic (water-insoluble) to highly polar and hydrophilic (water-soluble). Therefore, the R groups of the 20 genetically encoded amino acids are clustered into the following categories: neutral (i.e., uncharged) and nonpolar, neutral and polar, charged.

Table of contents :

1 Introduction 
1.1 AlgebraicModelling of RNA and Proteins
1.2 Agent-basedModelling and Simulation
1.3 Organisation of theManuscript
1.4 Modélisation algébrique de l’ARN et des protéines
1.5 Modélisation et simulation basées sur des agents
1.6 Organisation du manuscrit
I AlgebraicModels 
2 Background andMethods for the Part I 
2.1 Basic Introduction toMolecular Biology and Gene Expression
2.1.1 RNA Translation
2.1.2 Protein Structure and Folding
2.1.3 Functional RNA
2.1.4 RNAWorld
2.1.5 Haemoglobin and Anaemias
2.2 AlgebraicModelling of Biological Systems
2.2.1 Calculus of Communicating Systems
2.2.2 Labelled Transition Systems
2.2.3 Hennessy-Milner Logic
2.2.4 From Algebraic to Agent-basedModels
3 RNAs and Proteins equivalence 
3.1 Introduction
3.2 Results
3.2.1 Bisimilarity equivalence
3.2.2 Higher abstraction level model
3.3 Discussion
3.4 Conclusions
4 Algebraic Study of ProteinMisfolding 
4.1 Introduction
4.1.1 DNA replication and gene expressionmodels
4.1.2 Formal description of HBB gene replication and expression
4.2 Results
4.3 Discussion
4.4 Conclusions
5 Algebraic Characterisation of Non-coding RNA 
5.1 Introduction
5.2 Results
5.2.1 Ligand Binding Function
5.2.2 Enzymatic Function
5.2.3 Model checking
5.3 Conclusions
II Agent-based Simulation ofMetabolic pathways 
6 Background andMethods for the Part II 
6.1 Introduction to Yeasts’ Glycolysis
6.2 Agent-based approach
6.2.1 Agent-based Simulator forMetabolic Pathways
6.2.2 From a KineticModel to aMultiagent Simulation
6.2.3 Choosing a Reference KineticModel
6.2.4 Defining the Input for the Simulation
6.2.5 Simulation Output and Visualisation
7 Testing in Silico the Bimolecular Interactions 
7.1 Introduction
7.2 IntegrativeMethods for this Chapter
7.2.1 Long-distance Electrodynamic Interactions
7.2.2 Modelling the Whole Glycolytic Pathway
7.2.3 Simulating a Large Number ofMolecules
7.3 Results
7.4 Discussion
7.5 Conclusions
8 Interaction-as-perception inMetabolic Reactions 
8.1 Introduction
8.2 IntegrativeMethods for this Chapter
8.2.1 Multi-agentModelling and Simulation
8.2.2 Simplicial Data Analysis
8.2.3 Interaction-as-perception Paradigm
8.3 Results
8.4 Discussion
8.5 Conclusions
9 Conclusions

GET THE COMPLETE PROJECT

Related Posts