SAMPL6 Challenge Results from pKa Predictions Based on a General Gaussian Process Model

Caitlin C. Bannan; David Mobley; A. Geoff Skillman

doi:10.26434/chemrxiv.6406505.v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

SAMPL6 Challenge Results from pKa Predictions Based on a General Gaussian Process Model

26 September 2018, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

A variety of fields would benefit from accurate pK_a predictions, especially drug design due to the affect a change in ionization state can have on a molecules physiochemical properties.

Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic pK_as of 24 drug like small molecules.

We recently built a general model for predicting pK_as using a Gaussian process regression trained using physical and chemical features of each ionizable group.

Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton.

These features are fed into a Scikit-learn Gaussian process to predict microscopic pK_as which are then used to analytically determine macroscopic pK_as.

Our Gaussian process is trained on a set of 2,700 macroscopic pK_as from monoprotic and select diprotic molecules.

Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge.

Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic.

Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction.

Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy.

The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable.

Keywords

Supplementary materials

Title

Description

Actions

Title

sampl6 SI

Description

Actions

Title

Description

Actions

Supplementary weblinks

Title

Description

Actions

Title

Description

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

SAMPL6 challenge results from $$pK_a$$ predictions based on a general Gaussian process model

Caitlin C. Bannan, David L. Mobley, A. Geoffrey Skillman journal article

Journal of Computer-Aided Molecular Design , Volume 32, Issue 10

Online publication date: Oct 15, 2018

Version History

Sep 26, 2018 Version 2

Jun 04, 2018 Version 1

Version Notes

In this version we fixed a bug in our analysis for Figure 5 comparing microscopic predictions by our method, Splus, and ACD GALAS. We also updated our conclusions related those results.

Metrics

6,812

1,060

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv.6406505.v2

Funding

DLM and CCB appreciate the financial support from the National Science Foundation (CHE 1352608) and the National Institutes of Health (1R01GM108889-01). CCB was supported financially by OpenEye Scientific Software to build this model during Summer 2017 and is now supported by a fellowship from The Molecular Sciences Software Institute under NSF grant ACI-1547580.

Author’s competing interest statement

no conflicts of interest

SAMPL6 Challenge Results from pKa Predictions Based on a General Gaussian Process Model

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Share