Get Complete Project Material File(s) Now! »

## Exploration of the structure-sequence space

Most CPD programs generate a limited number of protein variants, solving problems of reduced complexity in an acceptable computing time. Indeed, one could be interested in the lowest energy state or a group of sequences close to this Global Minimum Energy Configuration (GMEC). Several approaches can be found in literature: some of them are heuristic, others are stochastic.

Heuristic methods Heuristic methods are useful to determine a group of low energy configurations starting from one or more random states. A popular method was introduced by Wernisch et al. [2000]. They proposed a simple sampling method, where starting from a random protein sequence, a single position was picked and and its rotamer optimized. The procedure was repeated for many iterations, until a local minimum was found. The method successfully reproduced side chain conformations for surface residues and stability changes for mutations applied in the protein core or at its surface. However, this method generally produces only a few variants. For this reason it is not particularly adapted to perform high-throughput design, where one wants to generate a distribution of many protein variants with a single simulation. Monte-Carlo methods Monte Carlo (MC) sampling can be used to obtain a set of protein sequences generated from a stochastic process. One popular Monte Carlo method suitable for protein design is based on the Metropolis algorithm. With MC, it is possible to generate a distribution of protein sequences that are distribuited according to a particular probability density function. The system energy is used to accept or refuse new system configurations that populate the desired distribution. For MC, convergence is assured only for a very long simulation, and the sampling can be stuck in local minima. However, several advanced sampling techniques can be employed to avoid too long simulations and to jump free energy barriers. More technical information will be given below, with a detailed description about implementation in our software Proteus.

The main advance of Monte Carlo is that is usually simple to implement and can be easily adapted to many different problems. Sampling Boltzmann probability, is also possible to extract statistical-mechanics properties (for example free energy differences) which can be easily related to experimental data or to results of molecular dynamics simulations.

### CPD softwares

Before introducing our in-house software Proteus, we briefly describe two CPD programs used in the lab during the last few years.

Rosetta is a collection of programs suitable for molecular modelling developed by a large community of researchers (RosettaCommons is an organization that counts 150 developers around the world). RosettaDesign is a utility used for CPD of protein stability; other tools like RosettaDock or RosettaAbInitio are used to predict conformations of protein-ligand complexes or de novo protein structure prediction. Several Rosetta energy functions are inspired by molecular mechanics but make extensive use of statistical terms used to fit experimental data. The first versions were based on a fixed backbone and a backbone-dependent rotamer library, while models with limited backbone flexibility were introduced more recently. Despite its remarkable success in the protein design field, the Rosetta energies are usually expressed in unphysical, rosetta units. Even if some effort has been made to translate them to kcal/mol (Alford et al. [2017]), results are still difficult to interpret, especially when compared to those obtained with experiments or state-of-the-art molecular simulations. Moreover, for the design procedure the developers preferred fast algorithms (for example Monte Carlo with Simulated Annealing) which are able to generate a few variants in a limited computer time, rather than generating many protein variants with a single run. The user interested in generating an ensemble of sequences often needs to run several independent Rosetta simulations and then discard repeated variants.

#### Constant-activity and constant-pH Monte Carlo

The main goal of CPD is to produce one or more protein variants with an associated sequence score in order to optimize a desired property. For example, one can use a scoring function based on the protein unfolding free energy in order to study protein stability. In general, one is not directly interested in the quantitative estimation of free energy differences but in producing a relatively small set of protein variants, which will be eventually studied with more sophisticated methods. For this reason, several scoring functions do not directly represent a physical quantity and are expressed in arbitrary units. However, the CPD model along with the Monte Carlo sampling described above can be used to estimate equilibrium thermodynamic quantities like binding free energy differences or protonation probabilities at constant pH. Proteus is particularly suitable for this kind of calculations: the fact that it is based on well-established physical models allows to define sampling methods which target different equilibrium properties.

**Table of contents :**

List of figures

List of tables

**1 Computational design of PDZ-peptide binding **

1 Energy functions for CPD

1.1 Bonded and nonbonded interactions

1.2 Implicit solvent models

1.3 Structural models for CPD

2 Exploration of the structure-sequence space

3 CPD softwares

4 The Proteus CPD framework

4.1 Folded and unfolded states

4.2 Proteus Energy Function

4.3 Pairwise residue GB interaction

4.4 Monte Carlo exploration

5 Constant-activity and constant-pH Monte Carlo

**2 High throughput design of protein-ligand binding **

1 PB/LIE analysis

1.1 Semi-empirical Free Energy Function

1.2 Structural models and simulations setup

1.3 PB calculations

1.4 Results

2 Bias convergence

3 Perspectives: advanced sampling

**3 Polarizable free energy simulations **

1 Introduction

2 Induced dipole model

2.1 Fields and dipoles

2.2 Electrostatic energy

2.3 Induced dipole force fields

3 Point charge models

3.1 Fluctuating charge model

3.2 Drude pseudo-particle model

3.3 Drude polarizable water models

3.4 Simulating the Drude force field

3.5 MD implementation

4 Standard and relative binding free energies

5 Free energy perturbation

5.1 Thermodynamic integration

5.2 Bennet acceptance ratio

5.3 BAR and TI with Drude

5.4 Alchemical transformation

6 Artefacts in charging free energy calculations

6.1 PME with tinfoil boundary conditions: neutralizing gellium

6.2 Solvent polarization artefacts in a periodic system

**4 PDZ-peptide binding specificity with polarizable free energy simulations **

1 Introduction

2 Methods

2.1 Structural models and simulation setup

2.2 Alchemical MD simulations

2.3 Drude dual topology

2.4 Spurious interactions with Drude and dual topology

2.5 PME correction

3 Results

4 Conclusions

**5 Classical Drude Model for methyl phosphate and phosphotyrosine **

1 Introduction

1.1 Phosphotyrosine in Tiam1 binding

1.2 Phosphates in biology

1.3 Earlier additive results, need for polarizability

1.4 Methyl phosphate as a model

2 Methods: overview

2.1 QM quantities and software

2.2 The GAAMP and DGENFF tools

2.3 Parametrization strategy

2.4 Phosphate:Mg2+ binding model

3 Methods: QM calculations

3.1 QM electrostatic potential maps

3.2 QM polarizability and dipole moment

3.3 QM solute–water and solute–ion interaction energy scans

3.4 QM dihedral scans

3.5 QM vibration frequencies

4 Methods: fitting the QM quantities

4.1 Fitting QM potential maps

4.2 Fitting the molecular polarizability

4.3 Fitting solute–water interaction energies and the molecular dipole

5 Methods: phosphate:Mg2+ binding free energies

5.1 Free energy perturbation protocol

5.2 Simulations setup

5.3 Alchemical free energy simulations

6 Results

6.1 Initial MP model and atom types

6.2 Fitting the MP− potential maps

6.3 Fitting the MP− polarizability

6.4 Fitting the MP−–water radial interaction energy scans

6.5 Electrostatic parameters optimization for MP2− and P2−

6.6 Fitting the dihedral energy scans

6.7 Conformational energies of MP− and MP2−

6.8 Fitting QM interactions with Mg and Na ions

6.9 Mg2+:phosphate binding free energies

6.10 Final MP and P2−i topology and parameters

7 Classical Drude model for dianionic phosphotyrosine

7.1 QM quantities

7.2 Electrostatic parameters optimization

7.3 Dihedral parameters and fit

7.4 Final pCRES and pTyr−2 topology and parameters

7.5 Simulating Drude phosphotyrosine in the Tiam1:pSdc1 complex

8 Conclusion

**Bibliography**