Abstract
Selection bias is inevitable in manually curated computational reaction databases but can have a significant impact on generalizability of quantum chemical methods and machine learning models derived from these data sets. We propose a new discrete representation of reaction mechanisms as subgraphs of a network of formal bond breaks and bond formations (transition network) composed of all shortest paths between reactant and product nodes. To construct a bias-free data set, the subgraphs between all pairs of network nodes (containing only neutral molecules) are considered. These quasireaction subgraphs include both reactive (reaction subgraphs) and non-reactive instances. The proposed approach thus transforms the problem of obtaining a bias-free data set of reaction mechanisms to binary classification of a set of quasireaction subgraphs that is free from selection bias by construction. We compute the statistics of the topological properties of quasireaction subgraphs in CHO reaction networks and characterize their similarities using clustering with Weisfeiler--Lehman graph kernels.
Supplementary materials
Title
Supplementary Material Statistics and Bias-Free Sampling of Reaction Mechanisms from Reaction Network Models
Description
These supplementary materials contain reaction rules, high-energy species patterns, statistics of CHO reaction networks, shortest path statistics in CHO reaction networks, and silhouette scores of quasireaction subgraph clustering.
Actions