Abstract
A large data set of kinetically plausible proton transfer steps was created. A set of over 48 million proton transfer steps, between heteroatoms, was generated combinatorially from a set of about 8,000 acids and conjugate bases for which experimental aqueous pKas around room temperature were available. The set was augmented with about 100 estimated pKas of highly reactive species important for reaction mechanisms. The resulting set of pKas span a range from -15 to +37. Rate constants were estimated at 25 °C using the pKas and utilizing a simplified Eigen equation without statistical factors. Steps with estimated rate constants ≥ 10^3 M^-1 s^-1 – a conservative boundary – were included in the data set. An additional set of 15,138 proton transfer steps were estimated using the Eigen-Bernasconi equation for proton transfers from carbon acids to heteroatom bases for which intrinsic rate constants and Brønsted 𝛽 values were known. Steps for proton transfers from carbon with estimated rate constants ≥ 10^-1 M^-1 s^-1 were added to the data set. Each entry was encoded in SMIRKS format, which is commonly used for machine learning, with electron-flow specification. The objective of this work was the creation of a structurally rich data set rather than accurate calculation of rate constants.
Supplementary materials
Title
100 Representative Proton Transfer Steps in SMIRKS Format
Description
Very small representative sample of 100 out of the 48M proton transfer steps.
Actions
Title
Heteroatom Acids and Bases in SMILES Format
Description
Lists of heteroatomic acids and bases in SMILES format with the acidic and basic atoms labeled, with pKas, including those from DataWarrior, Reich pKa Table, and key values from the literature
Actions
Title
Carbon Acids and Heteroatom Base Classes with Intrinsic Parameters
Description
List of intrinsic rate constants and beta values for 65 carbon acids in SMILES format, with statistical factors, and heteroatom base classes; lists of heteroatom bases in SMILES format, sectioned by class and with statistical factors, selected from the Heteroatom set
Actions