CarbonAI, A Non-Docking Deep learning based small molecule virtual screening platform

28 March 2023, Version 2
This content is a preprint and has not undergone peer review at the time of posting.


Structure-based virtual screening is a promising in silico technique that integrates computational methods into drug discovery. The most extensively used method in structure-based virtual screening is molecular docking. However, the docking process is not computationally efficient and simultaneously accurate due to classic mechanics-based scoring functions. These can only approximate, but not reach, quantum mechanics precision. In order to reduce the computational cost of the protein-ligand scoring process and use data-driven approaches to boost the scoring function accuracy, deep learning non-docking methods can be used by utilizing 3D structure or 1D sequence information of the protein target. This method can minimize the error inherited from molecular docking methods and avoid the extensive computational cost of docking. Furthermore, these two methods are integrated into an easy-to-use framework, CarbonAI, that provides both choices for researchers. Graph neural network (GNN) is employed in the 3D version and BiLSTM has been adopted in the sequence version of CarbonAI, respectively. To verify our approaches, different experiments were performed on two datasets, an open dataset Directory of Useful Decoys: Enhanced (DUD.E) and an in-house proprietary dataset without computer generated artificial decoys (NoDecoy). On DUD.E we achieved a state-of-the-art AUC of 0.981 and on NoDecoy we achieved an AUC of 0.974 whereas on the conventional docking program, the respective AUC performance is less than 0.8. The CarbonAI engine also reaches a state-of-the-art enrichment factor at top 2 percent for 36.2 folds. We have also retrospectively validated the CarbonAI models with various wet lab experimental data, and the results demonstrated a consistently accurate performance. Furthermore, the inference speed of the engine was benchmarked using the openly available 2021 Enamine REAL Database (RDB), that comprises over 1.36 billion molecules in 4050 core-hours using our CarbonAI non-docking method (CarbonAI-ND). The inference speed of CarbonAI-ND is about 36000 molecule per core-hour, compared to typical docking methods' speed of 20, which is about 16000 times faster than conventional docking method. Overall, the experiments indicate that CarbonAI is accurate and computationally efficient with good generalization to different molecular targets for virtual screening.


deep learning
virtual screening
graph neural networks
structure-based modeling


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.