A Data Set of Plausible Proton Transfer Steps For Arrow-Pushing Mechanisms

08 March 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

A large data set of kinetically plausible proton transfer steps was created. A set of over 48 million proton transfer steps, between heteroatoms, was generated combinatorially from a set of about 8,000 acids and conjugate bases for which experimental aqueous pKas around room temperature were available. The set was augmented with about 100 estimated pKas of highly reactive species important for reaction mechanisms. The resulting set of pKas span a range from -15 to +37. Rate constants were estimated at 25 °C using the pKas and utilizing a simplified Eigen equation without statistical factors. Steps with estimated rate constants ≥ 10^3 M^-1 s^-1 – a conservative boundary – were included in the data set. An additional set of 15,138 proton transfer steps were estimated using the Eigen-Bernasconi equation for proton transfers from carbon acids to heteroatom bases for which intrinsic rate constants and Brønsted 𝛽 values were known. Steps for proton transfers from carbon with estimated rate constants ≥ 10^-1 M^-1 s^-1 were added to the data set. Each entry was encoded in SMIRKS format, which is commonly used for machine learning, with electron-flow specification. The objective of this work was the creation of a structurally rich data set rather than accurate calculation of rate constants.

Keywords

proton transfer
data set
machine learning

Supplementary materials

Title
Description
Actions
Title
100 Representative Proton Transfer Steps in SMIRKS Format
Description
Very small representative sample of 100 out of the 48M proton transfer steps.
Actions
Title
Heteroatom Acids and Bases in SMILES Format
Description
Lists of heteroatomic acids and bases in SMILES format with the acidic and basic atoms labeled, with pKas, including those from DataWarrior, Reich pKa Table, and key values from the literature
Actions
Title
Carbon Acids and Heteroatom Base Classes with Intrinsic Parameters
Description
List of intrinsic rate constants and beta values for 65 carbon acids in SMILES format, with statistical factors, and heteroatom base classes; lists of heteroatom bases in SMILES format, sectioned by class and with statistical factors, selected from the Heteroatom set
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.