Statistics and Bias-Free Sampling of Reaction Mechanisms from Reaction Network Models

02 March 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Selection bias is inevitable in manually curated computational reaction databases but can have a significant impact on generalizability of quantum chemical methods and machine learning models derived from these data sets. We propose a new discrete representation of reaction mechanisms as subgraphs of a network of formal bond breaks and bond formations (transition network) composed of all shortest paths between reactant and product nodes. To construct a bias-free data set, the subgraphs between all pairs of network nodes (containing only neutral molecules) are considered. These quasireaction subgraphs include both reactive (reaction subgraphs) and non-reactive instances. The proposed approach thus transforms the problem of obtaining a bias-free data set of reaction mechanisms to binary classification of a set of quasireaction subgraphs that is free from selection bias by construction. We compute the statistics of the topological properties of quasireaction subgraphs in CHO reaction networks and characterize their similarities using clustering with Weisfeiler--Lehman graph kernels.

Keywords

Reaction mechanisms
Databases
Reaction networks
Graph models

Supplementary materials

Title
Description
Actions
Title
Supplementary Material Statistics and Bias-Free Sampling of Reaction Mechanisms from Reaction Network Models
Description
These supplementary materials contain reaction rules, high-energy species patterns, statistics of CHO reaction networks, shortest path statistics in CHO reaction networks, and silhouette scores of quasireaction subgraph clustering.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.