Abstract
Visualization of the combinatorial library chemical space provides a comprehensive overview of available compound classes, their diversity, and physicochemical property distribution - key factors in drug discovery. Typically, this visualization requires time- and resource-consuming compound enumeration, standardization, descriptor calculation, and dimensionality reduction. In this study, we present the Combinatorial Library Neural Network (CoLiNN) designed to predict the projection of compounds on a 2D chemical space map using only their building blocks and reaction information, thus eliminating the need for compound enumeration. Trained on 2.5K virtual DNA-Encoded Libraries (DELs), CoLiNN demonstrated high predictive performance, accurately predicting the compound position on Generative Topographic Maps (GTMs). GTMs predicted by CoLiNN were found very similar to the maps built for enumerated structures. In the library comparison task, we compared the GTMs of DELs and the ChEMBL database. The similarity-based DELs / ChEMBL rankings obtained with “true” and CoLiNN predicted GTMs were consistent. Therefore, CoLiNN has the potential to become the go-to tool for combinatorial compound library design – it can explore the library design space more efficiently by skipping the compound enumeration.