ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
reason_vectors_chemrxiv.pdf (3.95 MB)
0/0

Learning Machine Reasoning for Bioactivity Prediction of Chemicals

preprint
submitted on 08.05.2020 and posted on 08.05.2020 by Suman Chakravarti

We describe a method for learning higher-level vector representations of interactions between molecular features and biology. We named the representations as the reason vectors. In contrast to the high-dimensional chemical fingerprints, reason vectors are much simpler with only about 5 dimensions. They allow abstract reasoning for bioactivity of chemicals or absence thereof, uncover causal factors in interactions between chemical features and generalize beyond specific chemical classes or bioactivity. These qualities enable us to perform powerful similarity searches that are vague and conceptual in nature. The methodology can handle novel combinations of features in query molecules and can evaluate chemical classes that are entirely absent in training data. The method consists of similarity-based near neighbor search on a reference database of biologically tested chemicals by a series of substructures obtained from stepwise reconstruction of the test molecule. A data-driven continuous representation of molecular fragments was used for molecular similarity computations. The technique was inspired by the ability of humans to learn and generalize complex concepts by interacting with the physical world. We also show that activity prediction of chemicals using the abstract reason vectors is very easy and straightforward, as compared to modeling in the raw chemistry space, and can be applied to both binary and continuous activity outcomes. Except for utilizing an unsupervised training to construct continuous molecular fingerprints, the methodology is devoid of gradient optimization or statistical fitting.

History

Email Address of Submitting Author

chakravarti@multicase.com

Institution

MultiCASE Inc.

Country

USA

ORCID For Submitting Author

0000-0001-7745-8747

Declaration of Conflict of Interest

The author is employed by MultiCASE Inc.

Exports