AIMSim: An Accessible Cheminformatics Platform for Similarity Operations on Chemicals Datasets



The recent advances in deep learning, generative modeling, and statistical learning have ushered in a renewed interest in traditional cheminformatics tools and methods. Quantifying molecular similarity is essential in molecular generative modeling, exploratory molecular synthesis campaigns, and drug-discovery applications to assess how new molecules differ from existing ones. Most tools target advanced users and lack general implementations accessible to the larger community. In this work, we introduce Artificial Intelligence Molecular Similarity (AIMSim), an accessible cheminformatics platform for performing similarity operations on collections of molecules (molecular datasets). AIMSim provides a unified platform to perform similarity-based tasks on molecular datasets, such as diversity quantification, outlier and novelty analysis, clustering, and inter-molecular comparisons. AIMSim implements all major binary similarity metrics and molecular fingerprints and is provided as a Python package that includes support for command-line use as well as a fully functional Graphical User Interface for code-free utilization.

Version notes

Edited for re-submission.


Thumbnail image of AIMSim Simplifying Similarity Analysis by Himaghna Bhattacharjee and Jackson Burns.pdf

Supplementary material

Thumbnail image of AIMSim Supporting Information.pdf
Supporting Information for AIMSim: An Accessible Cheminformatics Platform for Similarity Operations on Chemicals Datasets
Tabulated Similarity Measures, Graphical User Interface Walkthrough, Cluster Analysis of Solvents in Use Case, Speedup and Efficiency Tables, Statement of Availability of Source Code

Supplementary weblinks

AIMSim Documentation
Comprehensive documentation for AIMSim, including installation tutorials, usage tutorials, tabulated similarity metrics, and module structure tree.