Sampling and Mapping Chemical Space with Extended Similarity Indices

26 July 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Visualization of the chemical space is useful in many aspects of chemistry including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space. ChemMaps is a visualization methodology that approximates the distribution of compounds in large datasets based on the selection of satellite compounds that yield a similar mapping of the whole dataset when principal component analysis on similarity matrix was performed. Here, we show how the recently proposed extended similarity indices can help to find regions that are relevant to sample satellites and reduce the amount of high dimensional data needed to describe a library’s chemical space.


Chemical space
Data visualization
Extended similarity


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.