(Semi-) Automatic Review Process for Common Compound Characterization Data in Organic Synthesis

28 February 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


A method for data review in chemical sciences with a focus on data for the characterization of synthetic molecules is described. As current procedures for data curation in chemistry rely almost exclusively on manual checking or peer reviewing, a (semi-)automatic procedure for the evaluation of data assigned to molecular structures is proposed and demonstrated. The information usually required for the identification of isolated compounds is used to clarify whether the data is complete with respect to the available data types and metadata, if it is consistent with the proposed structure and if it is plausible in comparison to simulated data. Spectra prediction and automatic signal comparison are applied to NMR evaluation, mass spectrometry data are evaluated by signal extraction, and machine learning is used for IR analysis. The proposed protocol shows how an integration of different tools for data analysis can help to overcome the challenges of the currently purely manual reviewing and curation efforts for data in synthetic chemistry.


data curation
electronic lab notebooks
chemistry data

Supplementary materials

Supplemental information Part 1
Supplemental material on technical details and review summary for 110 selected datasets


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.