Central Dogma: fifty years of molecular biology

Get Complete Project Material File(s) Now! »

Gene expression: main mechanisms

The rest of the chapter should introduce the non-biologist reader to the topic of gene expression and its modeling, through a tour of the main biological mechanisms involved and the state-of-the-art of experiments and models. However, this is not intended to be exhaustive and the biological mechanisms described are oversimplified, since they are introduced in the perspective of the mathematical modeling of the next chapters.
In 1953 James D. Watson and Francis Crick published in the journal Nature an article [72] in which they expose their model of the structure of the DNA, which featured the anti-parallel double helix held together by hydrogen bonds between pairing nucleotides. In those turbulent years, theoretical biologists proposed models using the partial information at their disposal, years before the first experimental results of molecular genetics were available. All the models and mechanisms proposed were based on the information that could be extracted by indirect observations and this was also the case for Watson and Crick, who used X-ray diﬀraction images to propose their model.
The Watson-Crick model provided also key insights to explain how genetic information is transferred from a generation to the next and how this information may be spread within the cell. The authors, in the Nature article [72], write It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.

Central Dogma: fifty years of molecular biology

The central dogma of molecular biology, that was first stated by Crick in 1958 [11], was then re-stated by the author in 1970 as follows:
The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be trans-ferred from protein to either protein or nucleic acid.
The two main concepts that were produced in the late 50s were those of sequential infor-mation and of defined alphabets. At the time was already known that proteins have a specific three-dimensional configuration, which aﬀects the activity of the protein itself. The researchers decoupled the problem supposing that the amino acid chain was able to fold itself up, reducing the problem to a one dimensional one and allowing to focus just on the assembly of the polypep-tide chain. It was well-established at that time that the alphabet of the proteins is composed by twenty amino acids, but it was unknown the mechanisms that lead to their encoding.
At that time it was well known that DNA, RNA and proteins play a leading role in gene expression and the central dogma is a possible solution to the problem consisting in the formu-lation of general rules for the information transfer from a polymer with a defined alphabet to another one.
Crick represents the flow of detailed sequence information from one chain to the other using arrows, in a schema as in figure 1.1a, where all possible transfers are plotted. The transfers could be divided in three group following the general opinion in the late fifties: those for which that seemed to exist because of direct or indirect evidence, those which have no experimental evidence nor a strong theoretical need and those which were unlikely to exist. Crick carries out a progressive simplification of this scheme excluding first the processes in the last class and validating those more likely to happen.
Figure 1.1: All possible flows are showed in figure (a). Figure (b) shows the picture resulting in Crick’s paper [12]. The Solid arrows are “general transfers” (first class), dotted arrows are “special transfers” (second class) and the absent arrows are the undetected transfers.
Using the classification made by Crick in 1970 [12], we can draw the schema shown in figure 1.1b. Here the solid arrows represent the “general transfers” (first class), while the dotted arrows are the “special transfers” (second class). The absent arrows are the undetected transfers.
The central dogma has to be read as a negative statement saying that there are no information transfers from protein, stressing out which are the most likely transfers (solid lines) and which are the probable ones (dotted lines). Nevertheless the central dogma does not say anything about the machinery involved and the control mechanisms. It was an attempt to give theoretical insights on the main principles which lead to the expression of a gene, using the partial information available at the time and represents the very foundations of molecular biology.
Experiments have confirmed the correctness of the main principles stated by Crick and new technologies have considerably increased the knowledge on the subject and have given a detailed description of the underlying biochemical reactions. This descriptive approach seems to have no end: finer mechanisms pop up when more accurate techniques are available and take their place in the already complex scenario of gene expression.
Despite extensive researches in the field and the many knowledge acquired, little is known about fundamental mechanisms and strategies underlying protein production, because of the extreme complexity of the whole process and the stochastic nature of the elementary biochemical reactions. For all these reasons, mathematical models represent a tool of investigation, in order to isolate mechanisms and check hypothesis based on the acquired knowledge.

Gene expression: main biological mechanisms

The present section is devoted to a general short introduction of the main steps of gene expression and of the main biological mechanisms which intervene in such complex process. This is not intended to be exhaustive, but to introduce the basic terminology which will be used in the following chapters. Specific biological mechanisms will be introduced through the manuscript when needed.
Despite the Central Dogma gives the fundamental principles of information transfer in gene expression in any cell type, the description of the process via mathematical modeling should take into account the specificity of the cell types. In particular, models need to distinguish between prokaryotic and eukaryotic cells, at least for specific mechanisms and for their diﬀerent geometric organization. This PhD work focuses on prokaryotes and the subsequent modelisation is therefore aﬀected. However, we will make clear when a modeling choice is strictly connected with prokaryotes; all other choices must be understood as common to both cell types.

Gene activation

Gene activation is the process which allows a gene to be expressed at a specific time. The way this activation may occur varies a lot from gene to gene and from organism to organism. The main mechanisms causing gene activation are the dissociation of a repressor and the association of an activator.
A repressor is a DNA-binding protein that regulates the expression of a specific gene by binding the operator, which is a segment of DNA that a regulator binds to. The binding of the repressor blocks the attachment of RNA polymerase to the promoter and prevents the transcrip-tion of the genes. If an inducer molecule is present, it can interact with the repressor and inhibit its action by detaching it or preventing its binding to the operator.
An activator is a DNA-binding protein that regulates one or more genes by increasing the transcription rate. RNA polymerase binds to the promoter region of the gene, forming a complex which sometimes proceed to gene transcription. An activator recruits the RNA polymerase to its promoter region.
If the two previous mechanisms are shared between prokaryotic and eukaryotic cells, chromatin remodeling is specific to eukaryotes. Chromatin is the complex of DNA and histone proteins with which it associates. Hi-stones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into struc-tural units called nucleosomes. Chromatin on one side serves as a way to condense DNA within the cellular nucleus and, on the other side, as a control of gene expression. Raser and O’Shea [62] hypothesize that chro-matin remodeling is the key regulation mechanism for certain eukaryotic promoters.
Gene activation is a complex process resulting from diﬀerent mechanisms and it is gene and organism specific. Despite genes may show diﬀerent states, in first approximation it can be described as a two-states pro-cess, i.e. the gene may show only two possible states, active or inactive.
The number of copies of a gene within bacteria is a fundamental factor and should be considered in a model describing the expression of a specific gene. When bac-teria are growing they duplicate their DNA, that leads to a number of at least two copies per gene, since the genetic information has to be split between daughter cells.
Remark. Bacteria are often obliged to have more than two copies of DNA, since the duration of replication (∼ 40 minutes) is sometimes longer than cell cycle time, which is ∼ 20 minutes in Escherichia coli in fast growth conditions. For this reason, in “normal” growth conditions we observe DNA regions with one, two or four copies of genes, while in “regeneration” regime, where the cell division cycle takes about 20 minutes, we have up to eight copies of genes localized closer to the origin of replication.

READ A formal library for elliptic curves

Transcription

The transcription process can be described through the following fundamental steps:
1. initiation: the polymerase binds to one of the specificity factors σ to form a “holoenzyme” in order to attach to a specific promoter in the DNA. The more similar is a sequence to a “consensus sequence” the stronger is the binding to the DNA. After the first bond has been synthesized, the RNA polymerase must clear the promoter (this phase is called promoter clearance). During this time it may occur that a truncated transcript, called abortive initiation, is released;
2. elongation: after the promoter clearance, the polymerase assembles in a controlled fashion the mRNA chain;
3. termination: the ρ-independent transcription termination or the ρ-dependent transcrip-tion termination. The first involves terminator sequences within the RNA that signals the RNA polymerase to stop. The latter uses the ρ terminator factor to stop RNA synthesis.
For further details we refer to Appendix B.1.2.
Figure 1.3: Transcription. The polymerase binds on the active gene and transcription initiation takes place. Once the initiation step is completed, the polymerase starts copying one DNA strand and elongates the mRNA, which is eventually released into the cytoplasm.
The transcription regulation controls the frequency and the number of produced messengers. The gene transcription is subject to many control mechanisms and we just recall the most common. The specificity factors alter the specificity of RNA polymerase for a given promoter or set of promoters, making it more or less likely to bind to them, i.e. sigma factors in prokaryotic transcription. Other regulations are made at gene level and have been enumerated in the previous section. In post-transcriptional phase the regulatory machine controls the number of mRNAs that are translated into proteins. The stability and distribution of the diﬀerent transcripts is regulated (post-transcriptional regulation) by means of RNA binding protein (RBP) that controls the various steps and rates of the transcripts.
Prokaryotic and eukaryotic transcription shows peculiar characteristics. In fact, since there is no precise spatial organization in prokaryotes, translation step can start when the polymerase is still building the messenger. This is not possible in eukaryotes since transcription occurs in the nucleus and, therefore, the messenger needs first to be exported out of the nucleus in order that the translation can take place.
Schematically prokaryotic translation consists of the following steps (see Figure 1.4 for schematic representation):
1. initiation: which involves the assemblage of components such as ribosomal subunits (50S and 30S), mRNA, the first aminoacyl tRNA, GTP (energy) and initiation factors (IF1, IF2, IF3). The tRNA (transfer RNA) serves as the physical link between the nucleotide sequence of mRNA and the amino acid sequence of proteins. In particular, the aminoacyl tRNA (or charged tRNA) carries an amino acid to the ribosome as directed by the three-nucleotide sequence (codon) read by the ribosome. The ribosome has three sites: A, P and E sites. The A site is the entry-point for aminoacyl tRNA, except for the first that binds directly on the P site. In the P site the peptidyl tRNA is formed, i.e. a tRNA bound to the peptide being synthesized, and in the E site the uncharged tRNA detaches from the ribosome;
2. elongation: it is a controlled process in which the polypeptide chain is elongated with the addition of amino acids to the carboxyl end of the growing end. Elongation involves several elongation factors, a conformal change, bond formations, etc. The aminoacyl tRNA attaches in the A site, then moves to the P site where the polypeptide is attached to the growing chain and the uncharged tRNA is moved to the E site where exits from the complex;
3. termination: occurs when one of three terminating codons moves to the A site. These codons are not recognized by any tRNA but by the so called release factors. These factors trigger hydrolysis of the ester bond and release the newly produced protein in the cytoplasm. The ribosome recycling step is responsible of ribosome disassembly in such a way to be ready to start translation of other messengers.
Translation is carried out by more than one ribosome simultaneously. Because of relative large size of ribosomes, they can only attach sites on mRNA at least 35 nucleotides apart. The so called polysome is the complex of one mRNA and a number of ribosomes attached to it.

Table of contents :

1 Introduction
1.1 Gene expression
1.1.1 Central Dogma: fifty years of molecular biology
1.1.2 Gene expression: main biological mechanisms
1.1.3 Translation
1.2 Stochasticity: experiments
1.3 Intrinsic and extrinsic noise
1.4 Stochasticity: models
1.4.1 Limits of classic models
2 MPPP description of gene expression
2.1 Biology and mathematical assumptions
2.1.1 Biological context
2.1.2 Mathematical model of gene expression: three-stage model
2.1.3 Limits of classic models: the exponential assumption
2.2 MPPP Description of Gene Expression
2.3 General results
2.3.1 Gene state
2.3.2 Messengers
2.3.3 Proteins
2.4 Results: explicit formulas and numerical analysis
2.A Appendix: classic models
2.A.1 The Rigney’s model
2.A.2 Paulsson’s model survey
2.A.3 Swain’s model
3 Realistic model of gene expression
3.1 Four-Stage Model
3.1.1 Model and general results
3.1.2 Realistic assumptions
3.1.3 Explicit formulas under realistic assumptions
3.2 Qualitative and quantitative analysis
3.2.1 Biological data and model parameters
3.2.2 Estimation of fluctuations: deterministic elongation
3.2.3 Four-Stage Model: a counter-intuitive result
3.2.4 Proteolysis vs. dilution
3.2.5 Impact of different steps on protein fluctuations
4 Multi-protein model
4.1 Stochastic model
4.2 Asymptotic Behavior
4.3 Analysis of fixed point equation
4.3.1 The underloaded case
4.3.2 The Case of Overloaded mRNAs
A Mathematical tools
A.0.3 Marked Poisson Point Processes
B Biology
B.1 Biological Mechanisms
B.1.1 Gene activation
B.1.2 Transcription
B.1.3 Translation
B.1.4 mRNA degradation
B.1.5 Protein degradation
B.2 Biological glossary
B.2.1 16S ribosomal RNA
B.2.2 -galactosidase
B.2.3 DNA
B.2.4 Gene
B.2.5 Inducer
B.2.6 Operon
B.2.7 Promoter
B.2.8 Ribosomal Binding Site (RBS)
B.2.9 Ribosome
B.2.10 RNA
B.2.11 Messenger RNA (mRNA)
B.2.12 Ribosomal RNA (rRNA)
B.2.13 Transfer RNA (tRNA)
B.2.14 Shine-Dalgarno sequence