CarbonAI, A Non-Docking Deep learning based small molecule virtual screening platform

Junfeng Wu; Kevin Jin; Yang Jiao; Xiaojie Wang; Siwei Li; Lurong Pan

doi:10.26434/chemrxiv-2022-gk3n6-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

CarbonAI, A Non-Docking Deep learning based small molecule virtual screening platform

28 March 2023, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Structure-based virtual screening is a promising in silico technique that integrates computational methods into drug discovery. The most extensively used method in structure-based virtual screening is molecular docking. However, the docking process is not computationally efficient and simultaneously accurate due to classic mechanics-based scoring functions. These can only approximate, but not reach, quantum mechanics precision. In order to reduce the computational cost of the protein-ligand scoring process and use data-driven approaches to boost the scoring function accuracy, deep learning non-docking methods can be used by utilizing 3D structure or 1D sequence information of the protein target. This method can minimize the error inherited from molecular docking methods and avoid the extensive computational cost of docking. Furthermore, these two methods are integrated into an easy-to-use framework, CarbonAI, that provides both choices for researchers. Graph neural network (GNN) is employed in the 3D version and BiLSTM has been adopted in the sequence version of CarbonAI, respectively. To verify our approaches, different experiments were performed on two datasets, an open dataset Directory of Useful Decoys: Enhanced (DUD.E) and an in-house proprietary dataset without computer generated artificial decoys (NoDecoy). On DUD.E we achieved a state-of-the-art AUC of 0.981 and on NoDecoy we achieved an AUC of 0.974 whereas on the conventional docking program, the respective AUC performance is less than 0.8. The CarbonAI engine also reaches a state-of-the-art enrichment factor at top 2 percent for 36.2 folds. We have also retrospectively validated the CarbonAI models with various wet lab experimental data, and the results demonstrated a consistently accurate performance. Furthermore, the inference speed of the engine was benchmarked using the openly available 2021 Enamine REAL Database (RDB), that comprises over 1.36 billion molecules in 4050 core-hours using our CarbonAI non-docking method (CarbonAI-ND). The inference speed of CarbonAI-ND is about 36000 molecule per core-hour, compared to typical docking methods' speed of 20, which is about 16000 times faster than conventional docking method. Overall, the experiments indicate that CarbonAI is accurate and computationally efficient with good generalization to different molecular targets for virtual screening.

Keywords

deep learning

virtual screening

graph neural networks

non-docking

structure-based modeling

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 28, 2023 Version 2

Dec 13, 2022 Version 1

Version Notes

Fix typos to make this working paper more readable.

Metrics

1,461

641

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2022-gk3n6-v2

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

CarbonAI, A Non-Docking Deep learning based small molecule virtual screening platform

Authors

Abstract

Keywords

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share