Persistent Homology for Virtual Screening

05 October 2018, Version 3
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Finding new medicines is one of the most important tasks of pharmaceutical companies. One of the best approaches to finding a new drug starts with answering this simple question: Given a known effective drug X, what are the top 100 molecules in our database most similar to X? Thus the essence of the problem is a nearest-neighbors search, and the key question is how to define the distance between two molecules in the database. In this paper, we investigate the use of topological, rather than geometric, or chemical, signatures for molecules, and two notions of distance that come from comparing these topological signatures. We introduce PH_VS (Persistent Homology for Virtual Screening), a new system for ligand-based screening using a topological technique known as multi-parameter persistent homology. We show that our approach can match or exceed a reasonable estimate of current state of the art (including well-funded commercial tools), even with relatively little domain-specific tuning. Indeed, most of the components we have built for this system are general-purpose tools for data science and will be released soon as open source software.

Keywords

virtual screening
peristent homology
multi-parameter persistent homology
algebraic topology

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.