Molecular dynamics – Project topics materials

Get Complete Project Material File(s) Now! »

Ubiquitin

The model protein chosen in this study is ubiquitin, which is a small reg-ulatory protein frequently used in MD simulations. It is suitable to use ubiquitin in this project since it is a well-known protein previously used in many experimental studies. Its small size also makes it suitable for MD simulations, reducing the complexity of the simulations. The protein struc-ture in a solvated state was mapped using X-ray di raction and its structure data can be downloaded from the RCSB Protein Data Bank website rcsb.org with pdb-ID: 1UBQ. The structure contains 660 atoms distributed over 76 residues[7].
Figure 3: 3D model of the protein ubiquitin with high-lighted secondary structures[7]. The -sheets are repre-sented by the yellow arrow-shapes and the -helix by the purple helix.
The secondary structures as seen in the cartoon-representation of the protein in Figure 3 must be preserved in the orientation process.

Electric eld composition

The main idea behind dipole orientation of a protein in gas phase is to expose the protein to an external electric eld in order to align its dipole axis along the electric eld lines. The experimental technique of electrospray ionisation (ESI) is used to release free protein structures from a solution[8], which are then transferred into an apparatus similar to a parallel plate capacitor inside of which they are subject to a constant electric eld. Ideally, between the plates of the capacitor the proteins would orient and then be exposed to X-rays for imaging. In order to simulate this procedure using MD the electric eld must be implemented from the protein point of view: the eld has a constant amplitude when the protein is inside the capacitor, however this is not the case when the protein is approaching it (see Figure 4a).
Figure 4: a) 2D representation of the electric vector eld experienced by a protein injected into a paralell plate capacitor. Field magnitude increases as the protein ap-proaches the space between the plates. b) Electric eld to be implemented in the simulations shown as a function of time. This eld is a model of the time dependence of the electric eld strength experienced by a protein injected into a paralell plate capacitor.
We will consider the case when the protein is approaching along the sym-metry axis of the eld sketch in Figure 4a, when the electric eld in the protein reference frame changes only in magnitude and not direction as a function of time. In order to mimic this shape, the electric eld implemented in the MD simulation should be of a constant direction with an increasing amplitude as time increases, and stabilizing to a constant amplitude as the protein has fully entered the apparatus (see Figure 4b).
The value of t0 (a measure of how rapidly the eld increases) depends on how fast the injected protein is traveling; faster protein experiences a larger rate of eld magnitude change. It may also depend on the dimensions of the apparatus, mainly the separation distance of the plates of the capacitor.

Dipole orientation

The method of dipole orientation utilizes an external electric eld to orient a protein along its dipole axis. A dipole with dipole moment p~ in the presence of an electric eld of magnitude E will be subject to a torque of magnitude = jE p~ j = Ep sin( ) where is the angle between p~ and E. Using = I , where = and I dt2 is the moment of inertia of the dipole, we obtain the 1D equation of motion d2 Ep = sin( ) (15) dt I which for small angles reduces to the equation of a harmonic oscillator. In the context of this project, the resulting oscillations will be damped since a protein is not a rigid perfect dipole. Thus, the dipole axis of the protein eventually aligns with the direction of the external electric eld.

Structural stability of proteins

For a successful orientation process, the nal structure of the protein should be similar to the original structure. We use two di erent quantities to inves-tigate the structural changes of the protein during the orientation.

Root Mean Square Deviation, RMSD

The non-rigid structure of the protein has a di erent e ect besides the damp-ing. A strong electric eld would cause unfolding of the protein structure, and can therefore not be chosen arbitrarily. This must be taken into con-sideration when using this method to orient the protein. A common way to detect unfolding or damage to the molecular structure is by investigating potential e ects on the Root Mean Square Deviation (RMSD) of the protein complex during the orientation. After a least-square t of the structure with respect to the original state, the RMSD is de ned as RM SD(t) = v 1 N mi ~ri(t) ~ri(t0) k 2 (16) M i=1 where M is the total mass of the structure, ~ri is the position of atom i, and N is the total number of atoms. The RMSD is a measure of how much the structure of the system changes between times t and t0[9].

Root Mean Square Fluctuation, RMSF

The Root Mean Square Fluctuation RMSF is a measure of how much an atom within the protein structure uctuates around its mean position. It is de ned as the standard deviation of the position of the atom and is calculated over a given time interval[9].
RM SF = s n=1(ri r)2 ri is the position at the i:th time step in the time interval, n is the number of time steps in the interval and r is the average position during the entire time interval.
Thus the RMSD tells us how much on average an atomic position in the nal state deviates from its original position. The RMSF instead is a mea-sure of how much the atomic position uctuates around its mean position evaluated for a particular atom over a given time interval. Comparing the RMSF of all atoms in the structure would then show if certain parts of the protein is a ected more than others, which could not be revealed by observing the RMSD.

Simulations

The simulations in this project was performed using the MD simulation pack-age GROMACS Version 4.6.7 available for download at www.gromacs.org/Downloads. To calculate the trajectory of a system, GRO-MACS need a starting structure of a protein. This is provided in a .pdb le that contains a description of positions of the constituent atoms, which can be downloaded from the RCSB Protein Data Bank website[10]. Using the information in the .pdb le, a .top le containing information about the topology of the protein is created. This contains a complete description of which atoms interact with each other, in what way, and how strongly. A .gro le to be used as a starting structure of the simulations is also generated from the .pdb le. At this point the choice of force eld discussed in section 2.1.1 is speci ed. In this project the simulations are made in vacuum but the .pdb les from the Protein Data Bank website describe the composition of a solvated protein. Therefore the charge distribution of the structure and topology les had to be modi ed. In particular the protonation state of the amino acids were modi ed according to Marklund et. al, 2009[11].
Simulations in GROMACS normally follow a standard procedure which we will now take a closer look at. There are three di erent runs in each MD simulation. Before each run input .tpr les need to be created with the com-mand grompp. This is a binary le containing information about the starting structure, topology and simulation parameters of the protein. The simula-tion strategy we used (identical for all simulations) is given by the work ow in Figure 5.
The rst MD run is a so called energy minimization run. This step is necessary to relax the system before the simulation. The total force of the system is calculated, and the atoms are gradually displaced so that the total energy of the system is minimized. E ectively the nal state of this process simulates how the protein would behave at T = 0K. The information about which algorithms should be used in the energy minimization is provided in a input .mdp le.
After the energy minimization, a pre-run is made, to insure that the sys-tem is in equilibrium before the simulation. This is done using a thermostat at a given temperature, which in principle means that the system is put in thermal equilibrium with a heat bath of that temperature.
Finally, the simulation is run, generating a .gro le containing the nal struc-ture and a .trr le containing the time development of the system. Again, a .mdp le needs to be provided as input, containing the information about how the simulation should be conducted. The values of the maximum elec-tric eld strength E0 and the time t0 needed to reach it (see gure 4b) are speci ed at this stage. these values were varied for the di erent simulations according to section 3.3. The electric eld was set parallel to the x-axis in all simulations.

READ WORK STRESS IN THE CONTEXT OF ODL ACADEMICS

Method

Electric eld implementation

The electric eld in GROMACS can be implemented as a Gaussian envelope function (t t0)2 E(t) =E0 exp 2 2 cos (!(t t0)) (18) where parameters E0, t0, ! and can be varied. See Figure 6. In order for the implemented eld to have the desired form (see Figure 4b) we will choose ! = 1, which results in the eld presented in Figure 6b where t0 = 2 .
To obtain the appropriate eld image, the simulations was divided into two steps: the rst simulation from t = 0 to t = t0 (see Figure 4b), using the electric eld in Figure 6b with nal eld magnitude E0, and the second with a constant eld magnitude E0 starting at t = t0. A constant amplitude is implemented by letting lim!!0 and lim !1 . If we also let t = 0 occur when E = 0, we get the following eld
(E0 (t t0)2 ]; != 1 ; t0 =
E(t) = 2 2 t > t0 2! (19)
E0 exp cos(!(t t0)) t 2 [0; t0
which is exactly the eld shown in Figure 4b.

Degree of orientation D

The polar angle between the external electric eld direction and the dipole moment of the protein can be used to describe how oriented the protein is, approaching zero at total orientation. We de ne 1 cos( ) := D as the degree of orientation of the protein. D = 0 means total orientation, whereas D = 2 means the protein is anti-parallel to the electric eld direction. The general behaviour of and D as functions of time are given in Figure 7.
Figure 7: a) Angle of de ection of damped oscillator as a function of time. b) Degree of orientation 1 cos( ) = D of harmonic oscillator as a function of time. We use D as a measure of how well-oriented the protein is, such that D = 0 means completely oriented along x-direction and D = 2 means anti-parallel to the x-direction.

Simulation setup

The two independent parameters E0 and t0 (see Figure 4b) will be varied in order to investigate their e ect on the behaviour of the protein. Testing for di erent values of E0 will determine how strong the electric eld could be without damaging the structure of the protein. Testing for di erent values of t0 is also necessary since the injected proteins will enter the apparatus with di erent velocities and thus experience the increase of the eld magnitude di erently. The simulation setup was similar to that of Marklund et. al, 2017[3].
We will conduct the simulations using the OPLS-AA force eld, and the Berendsen thermostat at 300K will be used in the pre-run[9]. The di erent values of E0 will be chosen as E0 2 f0:1; 0:2; 0:3; 0:4; 0:6; 0:8; 1:0; 1:5; 2:0; 3:0gnmV which are similar to those used by Abrikossov in 2011[4], and t0 according to namely t0 = 5 ns, t0 = 2 ns and t0 = 0 ns which results in a constant eld. The t0 values for the increasing eld was chosen somewhat arbitrarily since there is no speci c documentation of the dimensions of the orientation device. The initial orientation of the total dipole moment of the ubiquitin will be set parallel to the the y-axis, in other words 90 o the electric eld direction.
We will perform simulations for all possible parameter combinations. Since the initial velocities of the individual atoms are taken randomly from a Boltz-mann distribution, we will make ten independent simulations for every pa-rameter combination for statistical reasons. Both the RMSD and RMSF will be calculated for only the -carbons of the protein.

Results

Alignment of protein

As time progresses we expect the dipole axis of the protein (initially oriented along the y-axis) to align parallel to the x-axis since this is the direction of the implemented electric eld. An example of this can be seen in Figure 9.
Figure 9: Snapshots of the orienting process of ubiquitin for t0 = 5 ns and E0 = 1:0 nmV . Electric eld is im-plemented parallel to the x-axis. As time increases the dipole moment of the protein aligns with the x-axis.
Depending on the values of E0 and t0 used in the simulation, the nal state of the protein after 10 ns was variously well oriented. This can be visualized by plotting the time development of the dipole moment components Mi like presented in Figure 10.
a) E0 = 0:1 nmV , b) E0 = 0:2 nmV , c) E0 = 0:4 nmV ,
d) E0 = 1:0 nmV : Convergence of Mx with Mtot implies that the protein is well-oriented.
For increasing E0 we see a better convergence of Mx with Mtot implying an orientation of the dipole moment in the x-direction since M is then almost parallel with the electric eld. The corresponding plots for the degree of orientation D (see section 3.2) are E0 = 0:2 nmV , c) E0 = 0:4 nmV , d) E0 = 1:0 nmV : Larger E0 give better alignment as time progresses.
which also suggests both a faster and better nal orientation for increasing E0. We now consider only the nal part of the simulations, and compare the nal value of D for di erent E0 and t0.

Average nal degree of orientation, Df

We want to compare the value of D towards the end of the simulation since that is when the imaging would be conducted on a real protein. As a measure of this we use the average D over the last 2 ns of the simulations as a nal degree of orientation Df which would be zero for a completely oriented pro-tein. We de ne the average of this quantity over all the repeated simulations as Df . The value of Df for di erent choices of parameters is presented in Figure 12.
Figure 12: Average nal degree of orientation Df evalu-ated for all values of E0 and t0. This is the average of Df which is the mean degree of orientation D of the last two ns of a simulation. The di erent colors represent the di erent kinds of implemented elds seen in Figure 8.
For elds larger than 0.6 nmV , the nal degree of orientation is rather the same for all three di erent elds, and does not improve notably for higher eld strengths. For lower eld strengths however, t0 = 5 ns on average results in a less oriented protein after 10 ns of simulaion, whereas there is no notable di erence between the constant and the t0 = 2 ns elds.

Time of orientation

Since the protein exhibits damped oscillatory motion, we expect the magni-tude of the oscillations to follow a exponentially decaying function as time progresses. The same goes for the de ection angle and thereby also also for D (see section 3.2). Through an exponential t of the peaks in the sim-ulation data of D, we can obtain an estimated decay constant k, and use it as a measure of the rate of orientation. A smoothing algorithm was used to reduce the in uence of noise in the data.
Figure 13: An example of an exponential t to the peaks of the degree of orientation D, for E0 = 1:0 nmV and t0 = 5 ns. The peaks of the data represent turning points in the protein oscillations, so that the decaying nature of the data re ect the decreasing amplitude of oscillations.
Repeating the tting for all 10 simulations for every parameter combination we can calculate a mean value of the decay constant k. Comparing ekt for di erent E0 values, we get the results presented in Figure 14.

Table of contents :

1 Introduction
2 Background
2.1 Molecular dynamics
2.1.1 Force eld
2.1.2 Leap-frog algorithm
2.2 Ubiquitin
2.3 Electric eld composition
2.4 Dipole orientation
2.5 Structural stability of proteins
2.5.1 Root Mean Square Deviation, RMSD
2.5.2 Root Mean Square Fluctuation, RMSF
2.6 Simulations
3 Method
3.1 Electric eld implementation
3.2 Degree of orientation D
3.3 Simulation setup
4 Results
4.1 Alignment of protein
4.1.1 Average nal degree of orientation, Df
4.1.2 Time of orientation
4.2 Unfolding and structural damage
4.2.1 RMSD
4.2.2 RMSF
4.2.3 Temperature
5 Discussion
6 Recommendations
7 Conclusions