Modeling the expansion of virtual screening libraries

04 August 2022, Version 2


Recently, the growth of commercially-available molecules has been driven by “tangible” make-on-demand, virtual libraries. Such billion-molecule libraries can never be fully synthesized, tested, or even stored. The only way to explore this expanded chemical space is by computationally prioritizing particular molecules for synthesis and testing, often by docking. The success of this prioritization may depend on library properties: how diverse are the molecules, how similar are they to bio-like molecules, such as metabolites and drugs, how does receptor-fit improve with library size, and how does the presence of artifacts grow with library size? To begin to investigate these questions, we compare the characteristics and performance of a library of 3 million “in-stock” molecules with that of ever-larger tangible libraries, up to 3 billion molecules in size. The bias toward biologically precedented molecules of the 886-fold larger tangible library decreases 19,000-fold compared to the in-stock library. Looking at docking hits, and not the overall libraries, thousands of high-ranking synthesized and tested tangible compounds from five ultra-large library docking campaigns are also dissimilar to bio-like molecules. These observations imply that bio-likeness plays little role in the likelihood of binding, appearing to contradict multiple studies to the contrary. Another important aspect of library growth is whether screening ever-larger libraries leads to better ligands. Judged by docking score, better fitting molecules are found as the library grows, with score improving log-linearly with library size. Finally, it is possible to imagine that as library size increases, so too do the chances of rare events—molecules that cheat the scoring function and rank artifactually well. Both simulation and experimental results from ultra-large library screens suggest that this is true—as the libraries grow, more and more artifacts can crowd the very top-ranking molecules. Although the nature of these artifacts appears to change from target to target, the expectation of their occurrence does not, and simple strategies may be devised to minimize the impact of these rare-event artifacts on the success of large library screens.



Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.