Making the InChI FAIR and sustainable by moving to open-source on GitHub

24 June 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The InChI (International Chemical Identifier) standard stands as a cornerstone in chemical informatics, facilitating the structure-based identification and exchange of chemical compounds across various platforms and databases. The InChI as unique canonical line notation has made chemical structures searchable in the internet on a broad scale. The largest repositories working with InChIs contain more than 1 billion structures. Central to the functionality of InChI is its codebase, which orchestrates a series of intricate steps to generate unique identifiers for chemical compounds. Up to now these steps were sparsely documented and the InChI algorithm had to be seen as black box. For the new 1.07 release the code has been analyzed and the major steps documented, nearly 3000 code issues like bugs, memory leaks, or Google-fuzz issues have been corrected. New test systems have been implemented that let users directly test the code developments. The move to GitHub has not only made the development more transparent but let external contributors join the further development of the InChI code.

Keywords

Molecular representation
InChI
FAIR data

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.