The importance of epigenetic drug and probe discovery is on the rise. This is not only paramount to identify and develop therapeutic treatments associated with epigenetic processes but also to understand the underlying epigenetic mechanisms involved in biological processes. To this end, chemical vendors have been developing synthetic compound libraries focused on epigenetic targets to increase the probabilities of identifying promising starting points for drug or probe candidates. However, the chemical contents of these data sets, the distribution of their physicochemical properties, and diversity remain unknown. To fill this gap and make this information available to the scientific community, we report a comprehensive analysis of eleven libraries focused on epigenetic targets containing more than 50,000 compounds. We used well-validated chemoinformatics approaches to characterize these sets, including novel methods such as automated detection of analog series and visual representations of the chemical space based on Constellation Plots and Extended Chemical Space Networks. This work will guide the efforts of experimental groups working on high-throughput and medium-throughput screening of epigenetic-focused libraries. The outcome of this work can also be used as a reference to design and describe novel focused epigenetic libraries.
Chemoinformatic Characterization of Synthetic Screening Libraries Focused on Epigenetic Targets
Figure S1. Profile of six drug-like properties of pharmaceutical interest. Figure S2. Most frequent Bemis-Murcko scaffolds in all eleven compound epigenetic focused libraries. Table S1. Measures of scaffold diversity based on Bemis-Murcko: Area Under the Curve of the cyclic system recovery curve. Figure S3. Fingerprint-based diversity of the 11 data sets with RDKit and MACCS keys (116-bits) fingerprints with five metrics. Figure S4. Constellation Plots for each of the eleven data sets. Table S2. Twenty most representative compounds per compound library as calculated with RDKit fingerprints. Figure S5. Calculated synthetic accessibility profiling of the 11 compound epigenetic-focused libraries.