Theoretical and Computational Chemistry

A Machine Learning Based Intramolecular Potential for a Flexible Organic Molecule

Daniel Cole Newcastle University


One limitation of the accuracy of computational predictions of protein–ligand binding free energies is the fixed functional form of the intramolecular component of the molecular mechanics force fields. Here, we employ the kernel regression machine learning technique to construct an analytical potential, using the Gaussian Approximation Potential software and framework, that reproduces the quantum mechanical potential energy surface of a small, flexible, drug-like molecule, 3-(benzyloxy)pyridin-2-amine. Challenges linked to the high dimensionality of the configurational space of the molecule are overcome by developing an iterative training protocol and employing a representation that separates short and long range interactions. The analytical model is connected to the MCPRO simulation software, which allows us to perform Monte Carlo simulations of the small molecule bound to two proteins, p38 MAP kinase and leukotriene A4 hydrolase, as well as in water. We demonstrate that the accuracy of our machine learning based intramolecular model is retained in the condensed phase, and that corrections to absolute protein–ligand binding free energies of up to 2 kcal/mol are obtained.


Thumbnail image of manuscript-cole.pdf
download asset manuscript-cole.pdf 1.00 MB [opens in a new tab]

Supplementary material

Thumbnail image of manuscript-cole-SI.pdf
download asset manuscript-cole-SI.pdf 0.43 MB [opens in a new tab]