Section Contents

  1. Learnt potentials
  2. Learnt dynamics and simulations

Learnt potentials

Figure 7: Samples of the learnt force-field in comparison to true statistics and the force-field learnt in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
. (A) Distance potential of the N to Ca covalent bond in a single alanine residue. (B) Angle potential of the N-Ca-C bond angle in a single alanine residue. (C+D) ψ and ψ torsion angle potentials in valine. (E) Proline Ca to sidechain centroid distance potentials. (F) Isoleucine sidechain to leucine sidechain distance potential.

After training on a dataset of 2,004 small proteins, many of the potentials resemble those learnt in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
and known chemical knowledge using only 308,671 refined parameters in the MLPs (down from 4 million in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
). In both force-fields, the energy values, along with the temperature and time step, cannot be given standard units due to the course-grained representations of the proteins Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
. Our model converged after just one epoch and took 12 hours of training. A substantial improvement over the 45 epochs and 2 months training on the same hardware that was seen in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
. Slower learning rates leading to multiple epochs of training before convergence were considered, but access to the relevant GPUs were restrictive.

Sample potentials from our learnt dynamics model, dθ, and those from Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
are provided in Figure 7. Potentials are scaled as appropriate and should not be taken as actually energy values due to the differences in the two methods, only the locations of the minima and profiles of the curves matter. Generally, the locations of the minima in the covalent distance potentials (Figure 7A & E) and the bond angles (Figure 7B) were well learnt. The slopes of the potentials seem to be more broad and less well defined that in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
.

Figure 7A is representative of all covalent bond potentials learnt in the backbone (minima in correct place but broad slope) and similar for Figure 7B and angles. The minima for roughly half of the 20 possible Ca-sidechain interactions were correctly learnt (see Appendix A) but all had broad slopes and some had multiple minima (not where we would expect alternative rotamer conformations to be). The dihedral angles in Figure 7C and D are considerably smoother than in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
but exhibit less well defined slopes. The minima are also correctly positioned in some, but not all, of Ramachandran space.


Learnt dynamics and simulations

Figure 8: Identical simulations of the same small protein (1CRN) using our learnt model. Left: cartoon, right: stick.


Sample trajectories made using the learnt model in both cartoon and stick representations (Figure 7). The animations show that the models have learnt native backbone flexibility but the tertiary structure eventually begins to unfold. Qualitatively, it can be observed that certain regions of the backbone have very loose dihedral angle constraints. The folding of small proteins was attempted the same as in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
but with little success.

Figure 9: RMSF of different residues in two small proteins.

As a final means of evaluating our learnt dynamics, we compare the flexibility seen in our simulations with the learnt force-field from Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
and NMR (Figure 9). To do this, we compute the Root Mean Squared Fluctuations (RMSFs) of every Cα atom in two small proteins, Trp-cage (2JOF A) and a novel peptide assuming an Beta-Beta-Alpha (BBA) Fold (1FME A). In general our model preserves native stability (compared to NMR) less well than Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
. As expected, the terminal residues of the proteins tend to me more flexible than the rest of the sequence, only many more residues partake in this flexible motion that is seen in NMR. However, we have generally learnt the fact that terminal residues should be flexible better than in Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
. For example, our method predicts that the C-terminal residue in BBA has an RMSF of 2.7 Å whilst the NMR emsemble is 3.4 Å. Greener & Jones (2021) Greener, J. G. & Jones, D. T. (2021),
‘Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins’,
bioRxiv
doi:10.1101/2021.02.05.429941
on the otherhand models the whole of the terminal alpha-helix to have an RMSF of less than 0.8 Å.