Abstract
DNA-Encoded Library (DEL) technology allows the screening of millions, or even billions, of encoded compounds in a pooled fashion which is faster and cheaper than traditional approaches. These massive amounts of data related to DEL binders and not-binders to the target of interest enable Machine Learning (ML) model development and screening of large, readily accessible, drug-like libraries in an ultra-high-throughput fashion. Here, we report a comparative assessment of the DEL+ML pipeline for hit discovery using three DELs and five ML models (fifteen DEL+ML combinations using two different feature representations). Each ML model was used to screen a diverse set of drug-like compound collections to identify orthosteric binders of two therapeutic targets, Casein kinase 1𝛼/δ (CK1𝛼/δ). Overall, 10% and 94% of the predicted binders and not-binders were confirmed in biophysical assays, including two nanomolar binders (187 and 69.6 nM affinity for CK1𝛼 and CK1δ, respectively). Our study provides insights into the DEL+ML paradigm for hit discovery: the importance of an ensemble ML approach in identifying a diverse set of confirmed binders, the usefulness of large training data and chemical diversity in the DEL, and the significance of model generalizability over accuracy. We shared our results via an open-source repository for further use and development of similar efforts.
Supplementary materials
Title
Supplemental_file
Description
Supplementary figures and supplementary table legends
Actions
Title
Supplemental_tables
Description
Five Supplementary tables as separate sheets in one excel file
Actions
Supplementary weblinks
Title
DEL_ML_codabase
Description
This repository contains pretrained models and scripts used for prediction mentioned in the paper.
Actions
View