PySIDT: Subgraph Isomorphic Decision Trees for Molecular Property Prediction

27 September 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Accurate molecular property prediction is incredibly important across all fields of chemistry. Deep neural networks (DNNs) have become increasingly popular due to their ability to train automatically, avoiding the incredibly tedious process of constructing and extending traditional property estimation schemes. However, DNNs require large amounts of training data, are challenging to interpret, require large amounts of memory to load even during inference, and have severe difficulties incorporating qualitative chemical knowledge, which are often desired for molecular property prediction tasks. Here we present PySIDT (https://github.com/zadorlab/PySIDT) a software for training and running inference on Subgraph Isomorphic Decision Trees (SIDTs). SIDTs are graph-based decision trees made of nodes associated with molecular substructures. Inference is done by descending target molecular structures down the decision tree to nodes with matching subgraph isomorphic substructures and making predictions based on the final (most specific) node matched. SIDTs scale down well to dataset sizes much smaller than is feasible for DNNs. As trees of molecular substructures, SIDTs are inherently readable and easy to visualize making them easy to analyze. They are also straightforward to extend and retrain, facilitate uncertainty estimation, and enable easy integration of expert knowledge. We demonstrate the SIDT approach applying PySIDT to a diverse range of molecular prediction tasks: rate coefficient estimation, diffusion coefficient estimation, thermochemistry estimation, transition state bond stretch prediction, pKa prediction, stability of molecular structures, stability of surface structures, and prediction of surface lateral interactions.

Keywords

machine learning
property prediction
decision trees
chemical properties

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.