These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
ph_vs.pdf (688.08 kB)

Persistent Homology for Virtual Screening

revised on 04.10.2018 and posted on 05.10.2018 by Bryn Keller, Michael Lesnick, Theodore L. Willke

Finding new medicines is one of the most important tasks of pharmaceutical companies. One of the best approaches to finding a new drug starts with answering this simple question: Given a known effective drug X, what are the top 100 molecules in our database most similar to X? Thus the essence of the problem is a nearest-neighbors search, and the key question is how to define the distance between two molecules in the database. In this paper, we investigate the use of topological, rather than geometric, or chemical, signatures for molecules, and two notions of distance that come from comparing these topological signatures. We introduce PH_VS (Persistent Homology for Virtual Screening), a new system for ligand-based screening using a topological technique known as multi-parameter persistent homology. We show that our approach can match or exceed a reasonable estimate of current state of the art (including well-funded commercial tools), even with relatively little domain-specific tuning. Indeed, most of the components we have built for this system are general-purpose tools for data science and will be released soon as open source software.


Email Address of Submitting Author


Intel Corporation


United States of America

ORCID For Submitting Author


Declaration of Conflict of Interest

M. L. was an employee of Princeton University when this work was performed, and Princeton has a current grant from Intel Corporation. No other conflicts are declared.