Epik: pKa and Protonation State Prediction through Machine Learning

Ryne C. Johnston; Kun Yao; Zachary Kaplan; Monica Chelliah; Karl Leswing; Sean Seekins; Shawn Watts; David Calkins; Jackson Chief Elk; Steven V. Jerome; Matthew P. Repasky; John C. Shelley

doi:10.26434/chemrxiv-2023-c6z8t

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Epik: pKa and Protonation State Prediction through Machine Learning

11 January 2023, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Epik version 7 is a software program that uses machine learning for predicting the pKa values and protonation state distribution of complex, drug-like molecules. Using an ensemble of atomic graph convolutional neural networks (GCNNs) trained on over 42,000 pKa values across broad chemical space from both experimental and computed origins, the model predicts pKa values with 0.42 and 0.72 log unit median absolute and RMS errors, respectively, across seven test sets. Epik version 7 also generates protonation states and recovers 95% of the most populated protonation states compared to previous versions. Requiring on average only 47 ms per ligand, Epik version 7 is rapid and accurate enough to evaluate protonation states for crucial molecules and prepare ultra-large libraries of compounds to explore vast regions of chemical space. The simplicity of and time required for the training allows for the generation of highly accurate models customized to a program’s specific chemistry.

Keywords

Graph Convolutional Neural Network

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Composition of the training and validation sets; details on the methods, including the Macro-pKa approach, the effects of varying model layer depth, and the use of a “master” atom; an example speciation report; details on the approach used to obtain Epik Classic and Epik v 7 values for the test sets; additional results on results on the test sets; additional details on the similarity between the training set versus the test sets

Actions

Title

Epik v 7 Results on Test Sets

Description

Archive of the Epik v 7 results in CSV form separated by test set

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 27, 2023 Version 3

Mar 21, 2023 Version 2

Jan 11, 2023 Version 1

Metrics

3,997

3,149

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2023-c6z8t

Author’s competing interest statement

The authors declare the following competing financial interest(s): all authors are Schrödinger employees and hold financial interests in the company. Epik Classic & Epik v 7 are products sold by Schrödinger, LLC.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Epik: pKa and Protonation State Prediction through Machine Learning

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share