Several polypeptide chains can fold into a unique protein. Because one gene codes for one chain, proteins with several chains can contain more than one subunit. A monomer, is a molecule unit that through polymerization, can group with other molecules to form a larger compound. While being part of the compound, the monomer is called a subunit. The quaternary structure of a protein is composed by the total number of subunits in the compound. Monomers, dimers, and oligomers are proteins composed of one, two and several subunits, respectively.
Two amino acids in an oligomer, can therefore be part of an intermolecu-lar interaction, i.e., an interaction on two diﬀerent polypeptide chains. Oth-erwise, the interaction is called intramolecular. Zones where there are in-termolecular interactions are called interfaces; in other words, interactions happening only at the level of the quaternary structure. Residues laying on protein interfaces are referred to as “hot spots”.
The stability of interfaces is of crucial importance to the overall stability of the protein structure. There is therefore an interest to study the structure of protein interfaces on their own. In chapter one, we propose a model of amino acid network, called the hotspot network used to study structural variations on the protein interfaces belonging to the cholera toxin. The hotspot network of an oligomer is a subnetwork of its amino acid network (Definition 4). A network or a graph G = (V; E) contains the network G0 = (V 0; E0) if and only if V 0 V and E0 E. In which case G0 is called a subnetwork or subgraph of G.
Definition 4. The hotspot network (HSN) of a structure S given a cutoﬀ c, noted H(S; c) = (V H ; E H ), is a subgraph of the amino acid network G(S; c) = (V; E), where the set V H V , is equal to the set of hotspots in S, and an two hotspots u and v share an edge uv, only if u and v lie on diﬀerent chains and uv 2 E (Figure 1.6).
The majority of the protein structure networks used in this work are subnetworks of the amino acid network as well and will be defined in the following chapters.
Several structural properties of an amino acid are found the amino acid network. The degree of an amino acid in the network, that is, the number of links that are connected to it, depends on the distribution of other amino acids around it in the protein. Similarly, the weight of an amino acid is the number of the atoms belonging to other amino acids that are close. Will se this more in detail in Chapters 2, 3 and 4.
Figure 1.6: An example of a hotspot network of a structure in R2. (a) The protein structure is composed of two chains each having two residues. The atoms of each residue are shown in a diﬀerent color. Atom pairs at interaction distance are represented by dotted lines. (b) The hotspot network. Only residues in opposed chains can be interacting. Note that the labeling of nodes is prefixed by the name of their chain.
Other parameters can be calculated like the betweenness centrality of a node, which is the number of paths in the network passing to that node. The closeness centrality is the length of the paths between the node an all other nodes in the network. These parameters can be used to study the relevance of the node in the network in terms of interaction paths.
In the protein structure, same type (and size) amino acids can be diﬀerent in volume as we’ll see in Chapter 5. We’ll see that also the empty space around amino acids or void in the structure varies considerably between close residues. This is a consequence of the combinatory power of amino acid neighborhoods in proteins as shown in Chapter 5.
There is one subject that is omnipresent in diﬀerent forms during the entire spread of this work: mutations. A mutation is an evolutionary phe-nomenon happening continuously in proteins.
During the first part of this thesis, we study the eﬀects of mutations on the protein amino acid mutations. We consider mutations to be (only) variations of the atomic coordinates of proteins in the three-dimensional space. However, the variation in the structure is only a consequence of mutations. In reality, a mutation is a change in the genetic sequence of the protein happening during DNA replication. Subsequently, the protein is constructed from a segment of DNA or a gene into an amino acid sequence by the process called protein synthesis.
The process of protein gene expression starts with a segment of DNA (de-oxyribonucleic acid) called gene and ends with the synthesis of the protein amino acid sequence. In this section we briefly explain how this process is carried first in the (Eukaryotic) cell nucleus and then in the cytoplasm. Pro-tein gene expression can be divided into two subprocesses: transcription and translation.
Transcription and translation
Transcription of a gene refers to the process of copying the information in the gene stored in the nucleus of the cell into another molecule, called RNA (ribonucleic acid). Transcription can be divided into three steps: initiation, enlarging, and termination. Initiation starts when the molecule RNA poly-merase, while bound to the DNA, encounters a promotor site signaling the start of a gene. The RNA polymerase then unwinds the DNA double helix and in a complementary fashion starts copying the nucleotides from one of the DNA strands. The strand used is the template strand, to which RNA is going to be complementary. The RNA polymerase copies one nucleotide at a time by covalently linking the new complementary nucleotide of the RNA to the one previously added forming the RNA backbone. This is the phase of elongation. The RNA polymerase eventually finds a terminator or stop site and halts the synthesis of the RNA and releases from the template DNA strand. The double helix is closed again and the RNA molecule is ready to be used by the cell. The resulting RNA can be used for other jobs in the cell besides the creation of a protein molecule, here we’ll only focus on the messenger RNA (mRNA), the molecule in charge of passing the genetic message (Figure 1.7). The mRNA exits the cell’s nucleus by its pores and enters the cytoplasm, where the subprocess called translation takes palace.
The translation of the mRNA into a new sequence of amino acids is done by a protein called ribosome. The ribosome translates the information en-coded in the mRNA into amino acids. The sequence of the mRNA is divided in sets of three consecutive nucleotides called codons. Each codon translates to one of twenty amino acids, with the exception of the stop codons, which terminate the translation. All 64 codons (4 4 4) with their translation compose the genetic code. In this code, every amino acid is coded by more than one codon except for Methionine, and three codons are used to signal the termination of the translation.
Translation can be divided into four steps: activation, initiation, elon-gation, and termination. The activation phase consists of the amino acids binding to the transfer RNA (tRNA) molecules, which will transport the amino acids to the ribosome. The activation phase is when a small subunit of the ribosome binds the end of the mRNA (first codon). The elongation phase consists of the charged tRNA (tRNA with its corresponding amino acid) matching the codon, and binding to the ribosome. Finally, the ter-mination phase occurs when the ribosome encounters either a nonsensical codon or a stop codon, and finally detaching from the amino acid sequence (Figure 1.7). As we mentioned before, a mutation which is explained in the next subsection, can alter the structure of a protein. This alteration will be measured with the use of amino acid networks by comparing the networks of the mutated structure and the wild type.
A point mutation is a variation happening in the nucleotide sequence of the DNA (or RNA). They occur in nature mainly during DNA replication, but can happen also during the transcription and translation.
However, a mutation can produce a change to the structure of a protein and therefore to its function. In order to understand the role of mutations in the protein function and structure, we first need to mention the relation between the amino acid sequence and the structure and function of a protein.
The function of a protein depends on the underlying protein structure. Allosteric shifts or intrinsically disordered regions in the protein structure can yield several functions in a same protein . The main purpose of the protein structure is indeed to accomplish one, or several functions. There-fore the protein must fold in the conformation allowing the protein to well-function. This interdependency between function and structure is one of the motivations to study the structure of proteins. This is supported by the fact that there are extensive databases of crystalized protein structures available online.
Amino acid networks
Protein structure and function
An additional variation in the definition of amino acid networks lies in the distance used to consider two atoms to be interacting. When, for example, only alpha carbons are considered, the distance cutoﬀ is considerably larger than when all atoms are taken into account. Finally, some authors consider the side chain atoms only, neglecting the backbone.
Table of contents :
1.1 Protein structure
1.1.1 Amino acids and primary structure
1.1.2 Secondary and tertiary structures
1.1.3 Quaternary structure
1.2 Protein synthesis
1.2.1 Transcription and translation
1.3 Amino acid networks
1.3.1 Protein structure and function
2 Protein structural robustness
2.2.1 Aminoacidrank (aar) algorithm
2.3 Results and discussion
2.4 Survey of the structural changes
2.4.1 Specific examples
2.5 Structural Robustness, Fragility and Adaptation
2.7 Supplementary Information
2.7.1 Supplementary methods
2.7.2 Amino Acid Rank (Pseudocode)
3 Protein structure plasticity
3.2 Results and Discussion
3.2.1 Amino acid diversity in terms of amino acid neighbors—Degree statistics
3.2.2 Amino acid diversity in terms of number of atomic interactions—Weight statistics
3.2.3 Average Pairwise Weights (‹wi;j›)
3.2.4 Pairwise network compensation
3.4.2 Amino Acid Network
3.4.3 Pairwise theoretical average number of atomic interactions
3.4.4 Degree statistics
3.4.6 Accessibility Surface Area (ASA)
3.4.7 Degree and weight Envelopes
3.4.8 Jaccard measure
3.4.9 Mutated networks
3.4.10 Experimental methods
3.5 Supplementary material
3.5.1 Supplementary Tables
3.5.2 Supplementary Figures
4 Perturbation of amino acid networks
4.2.1 Amino Acid Network
4.2.2 Perturbation network P
4.2.3 Sphere of influence
4.2.4 The matrix M
4.2.6 The boolean matrix R
4.3 Results and Discussion
4.3.1 Sphere of Influence
4.3.2 Number of perturbed amino acids and functional change
5 Void around amino acids
5.2 The protein as a discrete mathematical object
5.3 Convex Hull Method
5.3.1 Envelope set
5.3.2 Basic idea
5.3.4 Barycentric coordinates
5.4 Delaunay Method
5.4.1 Basic idea
5.5 Empty tetrahedra method
5.5.1 Overlap between a sphere and a tetrahedron
5.5.2 Bounded empty tetrahedra
5.6.1 Large voids in hLTB5
5.6.2 Delaunay Method cutoff
5.6.3 Gap in atomic distances
5.6.4 Distribution of void
5.6.5 Void and Accessible Surface Area
5.6.7 Supplementary table
6.2 Local structure
6.2.1 Local structure of amino acids
6.2.2 Local structure of functional positions
6.3 Local void
6.4 Future work
7 Introduction (Français)
7.1 Le cadre
7.2 Structure locale
7.2.1 La structure locale d’acides aminés
7.2.2 Structure locale des positions fonctionnelles
7.3 Vide local
7.4 Travaux futurs