Chemical diversity is challenging to describe objectively. Despite this, various notions of chemical diversity are used throughout the medicinal chemistry optimization process in drug discovery. In this work, we show the usefulness of considering exploited vectors (EVs) during different phases of the drug design process to provide a quantitative and objective description of chemical diversity. We have developed a concise and fast approach to enumerate and analyze the exploited vector patterns (EVPs) of molecular compound series, which can then be used in archetypal compound selection tasks from hit matter identification to hit expansion and lead optimization. We firstly show that EVPs can be used to assess the progressibility of compounds in a fragment library design exercise. By considering EVPs, we then show how a set of compounds can be prioritized for hit expansion using EVP-based, customizable diversity sampling approaches, reducing the time taken and mitigating human biases. We also show that EVPs are a useful tool to analyze SAR data, offering the chance to uncover correlations between different vectors without pre-determining the molecular scaffold structures. The codes used to perform these tasks are presented as easy-to-use Jupyter notebooks, which can be easily adapted for other related tasks.
Exploiting Vector Pattern Diversity of Molecular Scaffolds for Cheminformatics Tasks in Drug Discovery
17 October 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.