Notes on Molecular Cell Biology
In this chapter we brie y introduce the most important concepts on molecular cell biology that we used throughout the thesis. Notably, we have mainly focused on gene expression, its regulation, and on some techniques used to measure gene products. For more details on the molecular biology of the cell we remand to [12, 13].
All living organisms are made of cells. Cells are small units (mostly 1 100 m), enclosed by a membrane and lled with a concentrated aqueous solution of chemicals. Each cell posses the same genetic information of the parent organism and this information, stored in DNA, is passed on to the daughter cells during cell division.
Organisms may consist of just one cell, and they are called unicellular, or they may be multicellular. Multicellular organisms are typically organized into tissues, which are groups of similar cells arranged so as to perform a speci c function in addition to the housekeeping processes common to all cells.
In this thesis we will not address cell di erentiation, i.e. formation of cell types in a multicellular organism, but we will only discuss the general (housekeeping) aspects of cell components and functions.
Prokaryotes and Eukaryotes
Cells are divided into two categories depending on the way the genetic material (DNA) is organized within them.
The first category is composed of prokaryotes which—by definition—are organisms whose cells do not have a nucleus nor other well-defined compartments (see Figure 2.1). Most prokaryotes are single-celled organisms, although some join together to form chains, clusters or other multicellular structures . In prokaryote cells DNA is stored in the cytoplasm in an area called nucleoid, but it is not enclosed within a separate nuclear envelope.
Figure 2.1: Schematic of a prokaryote cell. In a prokaryotic cell, all their in-tracellular components (proteins, DNA and metabolites) are located together in the same volume enclosed by the cell membrane. Many prokaryotes (bacteria) are able to move in a fluid-like environment using flagella, which are also used as sensors to detect concentration gradients and other signals. (Picture taken from ).
Eukaryotes belong to the second category and they can be defined as organisms whose cells have a nucleus. Eukaryotic cells, in general, are bigger and more elaborate than prokaryotes. They range from unicellular yeast to plants and animals, which are very complex multicellular organisms with billions of cells. Eukaryotes, in addition to a nu-cleus, have other organelles, sub-cellular structures that carry out specialized functions (see Figure 2.2). For examples, mitochondria are responsible for energy production through metabolism, and containing a very small amount of DNA; chloroplasts (plants) for photosynthesis; ribosomes serve as machinery for protein synthesis, and made up themselves of proteins and RNAs; endoplasmatic reticulum; and so forth. The cytoskele-ton, made up of micro-tubules and filaments, controls cell shape, drives and guides cell movements and plays a role in intra-cell substance transport.
Since in this thesis I mainly focus on bacteria, in what follows I will introduce the bacterium E. coli, which is considered by biologists as the model organism for prokaryotic cells and we will mainly concentrate on prokaryotic cell functions.
Figure 2.2: Eukaryotic (animal) cell. The nucleus is the most prominent organelle in the cell and contains chromosomes (the storage sites of DNA). Mitochondria produce chemical energy (ATP) for the cell. Centriolis are involved in nuclear division during cell division. Ribosomes, the endoplasmatic reticulum and the Golgi apparatus work together in the synthesis of proteins. (Picture taken from ).
E. coli as model organism
It is thought that all cells descended from a common ancestor . Hence, the knowledge gained from the study of one organism allows us to better understand others, even our-selves. But some organisms are more convenient than others to study in the laboratory. This is because some are easier to genetically manipulate and reproduce faster; others are multicellular but transparent and so biologists can easily watch the development of their tissues and organs.
Molecular biologists have focused on Escherichia coli (E. coli for short) as a model organism for prokaryotic cells. E. coli is a small, rod-shaped bacterium that normally lives in the gut of humans and other vertebrates, but it can be grown easily in a simple nutrient broth in a culture bottle. E. coli is able to grow in variable chemical conditions and it reproduces rapidly (approximately one generation in 20 minutes). The bacterium E. coli was one of the first organisms to have its complete genome sequenced . Its genetic information is stored in a single, circular double-stranded molecule of DNA, approximately 4.6 million nucleotide pairs long, and it makes 4300 diﬀerent proteins.
The molecular functioning of E. coli is better understood than any other organisms and most of our knowledge of the fundamental mechanisms of life (how cells replicate their DNA, how they decode these genetic instructions to make proteins, etc.) has come from studies on it. In fact, although human cells are eukaryotic cells, subsequent research has con rmed that basic molecular processes occur in the same way both in human and in E. coli cells .
Gene expression: from DNA to Protein
The central dogma of molecular biology says: \DNA makes RNA, RNA makes protein, and proteins make the cell » . This key paradigm of molecular biology states that the ow of information in gene expression is from genes encoded by DNA to mRNA by transcription and from mRNA to protein by translation (see Figure 2.4). At any given time, and in any given cell of an organism, thousands of genes and their products (RNA, proteins) actively participate in an orchestrated fashion to generate the macromolecular machinery for life.
Transcription: from gene to RNA
The genome, i.e. the genetic information of an individual, describes all the proteins that are potentially present in every cell of a given organism. This information is en-coded in the DNA molecule, which is a double-stranded helix made of alternating sugars (deoxyribose) and phosphate groups (related to phosphoric acid), with the nucleobases (guanine(G), adenine(A), thymine(T), and cytosine(C)) attached to the sugars (see Fig-ure 2.5).
The sugar-phosphate backbones of the two DNA strands form a uniform helix, with strands placed in opposite directions. The strands are held together by hydrogen bonds between opposing bases according to the base pair rule: A is always paired with T and G is always paired with C.
Within cells, DNA is organized into long structures called chromosomes. During cell cycle these chromosomes are duplicated in the process of DNA replication, providing each cell its own complete set of chromosomes.
The rst step in the synthesis of protein is transcription and it consists in copying the nucleotide sequence of a gene into RNA (ribonucleic acid). Like DNA, RNA is a polymer made of four di erent nucleotides . It di ers from DNA in three respects:
1. whereas DNA is always a double-stranded helix, RNA is single stranded;
2. the nucleotides in RNA are ribonucleotides, i.e they contain the sugar ribose rather than deoxyribose;
3. although, like DNA, RNA contains the bases adenine (A), guanine (G), and cyto-sine (C), it contains uracil (U) instead of thymine (T) found in DNA.
All of the RNA in a cell is made by transcription. The enzyme that carries out tran-scription is called RNA polymerase (RNAP). RNAP, to begin transcription, must be able to recognize the start of a gene, called promoter, and bind steadily to the DNA at this site. Then, RNAP moves stepwise along the DNA, unwinding the DNA double helix to expose the bases on each DNA strand. As RNAP progresses, it adds nucleotides one by one to the RNA chain using an exposed DNA strand as a template. Chain elon-gation continues until RNAP meets a stop site in the DNA, the terminator, where the enzyme halts and releases both the DNA template and newly made RNA chain. The resulting RNA transcript is thus single-stranded and complementary to one of the two DNA strands.
Figure 2.6: The process of transcription is carried out by RNA polymerase (RNAP), which uses DNA (black) as a template and produces RNA (blue). (Picture taken from ).
Several types of RNA are produced in cells. The majority of genes specify the amino acid sequence of proteins, and the RNA molecules that are transcribed from these genes are called messenger RNA (mRNA). Moreover, there are also non-messenger RNA: ribosomal RNA (rRNA) that forms the core of ribosomes, on which the mRNA is trans-lated into protein, and transfer RNA (tRNA) that selects and carries amino acids to the ribosome for the protein synthesis.
Translation: from RNA to protein
The next step in gene expression is called translation, because it allows the conversion of the information stored into mRNA to protein. Since there are only 4 di erent nucleotides in mRNA but 20 di erent amino acids in a protein, this translation can not be a direct one-to-one correspondence between a nucleotide in RNA and an amino acid in a protein. The rules by which the nucleotides of a gene, by means of mRNA, are translated into the amino acid sequence of a protein are known as the genetic code. Notably, an mRNA sequence is decoded in sets of three nucleotides, called codons, thus allowing 43 = 64 possible combination of three nucleotides, even though only 20 amino acids are commonly found in proteins.
The translation of mRNA into protein is due to adaptor molecules that recognize and bind|through base-pairing|to a codon at one site on their surface (called anticodon) and to an amino acid at another site. These adaptors are small RNA molecules (about 80 nucleotides in length) known as transfer RNAs (tRNAs). Transfer RNAs are captured and hold in position on the mRNA strand by a large molecular machine that moves along the mRNA allowing accurate and rapid translation of the genetic code. This complex molecular machine is the ribosome, which is made up of more than 50 di erent proteins (the ribosomal proteins) and several RNA molecules called ribosomal RNAs (rRNAs).
Regulation of Gene expression
The regulation of gene expression is the process by which individual cell speci es which of its many thousands of genes have to be expressed. This mechanism is paramount, especially for multicellular organisms, as animals, which have to di erentiate their cells in order to produce, for instance, muscle, nerve, blood cells and, eventually, all the variety of cell types seen in the adult . Thus, cell di erentiation arises because cells produce and accumulate di erent RNA and protein molecules . But, regulation of gene expression is also widely adopted by prokaryotic/unicellular cells like bacteria. In fact, bacterial cells can change the expression of their genes in response to external signals, for example, according to the food sources that are available in the environment [58, 133, 134].
Gene expression can be regulated at many steps in the pathway from DNA to RNA to Protein. Moreover, the stability of the nal gene product, whether it is RNA or protein, also contributes to the expression level of the gene|an unstable (faster degradation) product results in a lower expression level than a stable one which degraded more slowly.
Control of transcription is mostly exerted at the initiation step. In Subsection 2.2.1 we saw that RNAP binds to the promoter of a gene to make an RNA copy of the gene. In addition to the promoter, almost all genes have regulatory DNA sequences that are used to activate (resp. inhibit) the gene transcription by facilitating (resp. preventing) RNAP binding to the promoter. However, these regulatory DNA sequences|to have any e ect|have to be recognized by proteins called transcription factors, which bind to DNA. Hence, each transcription factor is able to recognize a di erent DNA sequence and so regulates only particular genes. Notably, a transcription factor is a repressor protein if, in its active form, it blocks the binding of RNAP to the promoter, thus switching genes o . But some transcription factors|called activators|do the opposite, that is they switch on some genes by binding nearby the promoter and helping RNAP to initiate transcription.
Post-transcriptional controls operate after RNAP has bound to the promoter of a gene to synthesize RNA. One of the most common ways to regulate gene expression at post-transcriptional level is to control translational initiation, so as to modulate protein synthesis.
Bacterial mRNAs, for example, have a ribosome-binding site (RBS) where trans-lation begins. These RBS have to be recognized by a ribosome, which binds to it and starts peptide synthesis. Hence, by blocking or exposing the RBS, the bacterium can either inhibit or facilitate the translation of an mRNA.
In this section we will brie y present some techniques used in molecular biology to mea-sure gene expression. Gene expression measurement and analysis have become essential tools for medical investigations and for characterizing complex biological circumstances.
Here, without going into details|which is behind the scope of this thesis|we will list some techniques used to quantify mRNA and protein abundance.
mRNA quanti cation
Several techniques are available to quantify levels of mRNA in a cell, generally referred to as DNA Microarray [18, 111]. DNA microarray is a tool that allows the RNA of thousands of genes to be monitored at the same time, so as biologists can visualize which genes are switched on (or o ) as cells grow, divide, or respond to hormones, toxins, or infections. The information contained in DNA microarrays say whether the expression of each gene has increased or decreased relative to a reference condition. It is therefore an essentially qualitative measurement.
The most common method to detect the presence of a speci c protein|or a small num-ber of them|in a sample taken from an experiment is the Western blot technique. The protein is extracted from the sample and together with a small number of antibodies| which recognize only speci c proteins|is transferred to a nitrocellulose membrane. Dif-ferent methods, for instance radioactive labelling of stains, are then used in order to produce bands, indicating the location of the protein. Finally, the intensity of the band is proportional to the amount of protein.
Table of contents :
2 Notes on Molecular Cell Biology
2.1 The Cell
2.1.1 Prokaryotes and Eukaryotes
2.1.2 E. coli as model organism
2.2 Gene expression: from DNA to Protein
2.2.1 Transcription: from gene to RNA
2.2.2 Translation: from RNA to protein
2.3 Regulation of Gene expression
2.3.1 Transcriptional control
2.3.2 Post-transcriptional control
2.4 Measurement Techniques
2.4.1 mRNA quantication
2.4.2 Protein quantication
2.4.3 Measurement limitations
3 Modelling Genetic Regulatory Network Systems
3.1 Boolean Models
3.1.1 Synchronous and Asynchronous networks
3.1.2 Graph theoretical representation
3.1.3 Example: Boolean bistable switch
3.2 Ordinary Dierential Equation (ODE) Models
3.2.1 Quasi-steady-state assumption of mRNA concentration
3.2.2 Example: ODE bistable switch
3.3 Piecewise Linear (PL) models
3.3.1 Dynamical study of PL systems
3.3.2 Solutions and Stability in Regular Domains
3.3.3 Solutions and Stability in Switching Domains
3.3.4 Example: PL bistable switch
3.4 Stochastic Models
3.4.1 The Chemical Master Equation (CME)
22.214.171.124 Stochastic simulation algorithm (SSA)
3.4.2 The chemical Langevin equation (CLE)
3.4.3 Example: CME and CLE bistable switch
3.5 Final comments
3.5.1 Deterministic Vs stochastic models
3.5.2 Quantitative Vs qualitative models
4 A Simple Model to Control Growth Rate of Synthetic E. coli during the Exponential Phase: Model Analysis and Parameter Estimation
4.2 The Open-loop Model
4.2.1 Growth rate
4.2.2 cAMP-CRP activation
4.2.3 CRP synthesis
4.2.4 CGEM synthesis
4.2.5 Proteins removal
4.3 Qualitative Analysis of the Open-loop Model
4.3.1 Open-loop model in glucose growth
4.4 Growth rate expression for exponential phase
4.5 In silico Identiability Analysis of Growth Rate
4.5.1 Problem Statement
4.5.2 Generation of Simulated Data Sets
4.5.3 Model Parametrization and Global Optimization
4.5.4 In Silico Practical Identiability Analysis
5 Controlling bacterial growth: in silico feedback law design to re-wire the genetic network
5.2 Piecewise linear models with dilution
5.2.1 Solutions in Regular Domains
5.2.2 Solutions in Switching Domains
5.2.3 Equilibria and Stability in Regular Domains
5.2.4 Equilibria and Stability in Switching Domains
5.3 Introduction to the control problem
5.4 Open-loop model
5.4.1 Growth rate
5.4.2 cAMP-CRP activation
5.4.3 CRP synthesis
5.4.4 RNAP synthesis
5.4.5 CRP and RNAP removal
5.5 Qualitative analysis of the open-loop system
5.5.1 Open-loop system in glucose growth
5.5.2 Open-loop system under an alternative carbon source
5.6 Closed-loop model
5.7 Qualitative analysis of the closed-loop system
5.7.1 Closed-loop system in glucose growth
5.7.2 Closed-loop system in maltose growth
5.8 Inverse Diauxie
6 Switched piecewise quadratic models of biological networks: applica- tion to control of bacterial growth
6.2 Piecewise Linear systems overview
6.3 The growth rate model
6.4 The Switched Piecewise Quadratic (SPQ) system
6.5 The PQ subsystem: dynamical study
6.5.1 Solutions and Stability in Regular Domains
6.5.2 Solutions and Stability in Threshold Domains
6.6 Stability Analysis of the SPQ system
6.7 Open loop control of the RNAP-ribosomes system
6.7.1 SPQ model of the open-loop control system
7 Attractor computation using interconnected Boolean networks: test-ing growth models in E. Coli
7.2.1 From discrete to Boolean models
7.2.2 Dynamics of Boolean models
7.2.3 Interconnection of Boolean models
7.2.4 Attractors of an interconnection
7.3 Application: a model for E. Coli growth mechanism
7.3.1 E. Coli nutritional stress response module
7.3.2 The cellular growth module
7.3.3 System interconnection
7.4.1 General properties
7.4.2 Growth Rate limited by ribosomes or RNA polymerase
7.4.3 Growth Rate limited by bulk proteins
7.4.4 Model discrimination
7.4.5 Dynamical behaviour
8 A coarse-grained dynamical model of E. coli gene expression machin-ery at varying growth rates
8.2 E. coli GEM network: biological description
8.2.1 Ribosomes synthesis and function
8.2.2 RNAP synthesis and function
8.2.3 Proteins synthesis and function
8.3 Mathematical background
126.96.36.199 Translation of nascent mRNA
188.8.131.52 Translation of completed mRNA
184.108.40.206 Comments on ribosome engaged in translation
8.3.3 Final conclusions
8.4 E. coli GEM dynamical model
8.4.1 rnn gene expression model
8.4.2 rpoBC gene expression model
8.4.3 bulk gene expression model
8.4.4 Complete dynamical model of E. coli GEM
8.5 Model calibration
8.5.1 Experimental data
8.5.2 Parameters taken from literature
8.5.3 Calculated growth-rate-dependent parameters
220.127.116.11 Promoter concentration of rnn operon
18.104.22.168 Promoter concentration of rpoBC genes
22.214.171.124 Promoter concentration of bulk genes
126.96.36.199 Promoter concentration of non-specic binding sites
8.5.4 Estimated parameters
8.6 Free RNAP and Free ribosomes
8.7 Model reduction
9 State estimation for gene networks with intrinsic and extrinsic noise: A case study on E.coli arabinose uptake dynamics
9.2 Stochastic modelling of genetic networks
9.3 Case study: E.coli arabinose uptake dynamics
9.4 Gene network state estimation
9.4.1 The Square-Root Unscented Kalman Filter
9.5 State estimation: Simulation results for the E.coli arabinose uptake system
9.5.1 Comparison of SRUKF and PF
9.5.2 Performance of the SRUKF in presence of extrinsic noise
10 Conclusions and Perspectives
10.1 Qualitative models
10.2 Qualitative control strategies
10.3 Quantitative models
10.4 Parameter estimation
10.5 Stochastic models and state estimation
10.6.1 Qualitative control: application to real data
10.6.2 Identiability, sensitivity analysis and validation of GEM model
10.6.3 Combining qualitative and quantitative formalisms for control purposes
10.6.4 Further investigation of dynamical growth rate models
10.6.5 Filtering applications of GRN models