Abstract
The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine Learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide HTVS protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time.In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and benchmark and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns.
Supplementary materials
Title
Supporting Information
Description
Supporting Figures S1-S10.
Supporting Tables S1-S8.
Summary of utilized Chemprop parameters.
Extended methodology: GAK receptor selection and docking method validation.
Actions
Title
GAK lead-like actives used for method validation as obtained from ChEMBL
Description
This spreadsheet contains identifiers, SMILES and activity data of GAK actives (defined as IC50, Kd or Ki of at least 1 µM) as obtained from ChEMBL (15/12/2022) alongside the corresponding ChEMBL assay ID and the original source DOI. These compounds were used for docking method validation in the manuscript.
Actions
Supplementary weblinks
Title
Schrodinger Phase databases for Enamine REAL lead-like library of 1.56 billion compounds (March 2021)
Description
A collection of Phase databases created from the Enamine REAL lead-like library as downloaded in March 2021 (1.56 billion compounds).
Actions
View Title
Glide HTVS docking results of Enamine REAL lead-like library (1.56 billion compounds) for two targets
Description
Docking results for 1.56 billion compounds of the Enamine REAL lead-like library (obtained March 2021) for the targets SurA and GAK. The intended use of the data is to serve as a giga-scale benchmarking dataset, e.g. for machine learning approaches.
Actions
View