Abstract
Background:
Chemical reactions form intricate, highly connected networks whose exploration is essential for discovering more efficient and sustainable synthetic routes. As reaction data from literature, patents, and high‑throughput experimentation continue to surge, so does the need for tools that can collectively navigate and mine these large‑scale datasets. Graph‑based representations naturally capture the topology of reaction space, but community‑accessible software for constructing and interrogating such networks is still limited. To close this gap—and to unlock AI‑driven, data‑centric decision‑making in chemical research—we present NOCTIS, an open‑source platform that streamlines the creation and analysis of reaction networks and accelerates synthesis optimization.
Result:
We present NOCTIS, an open-source Python package for constructing and analyzing Networks of Organic Chemistry (NOCs) from reaction strings. It supports graph-shaped queries that can be easily extended with user-defined queries, offers parallel processing for large datasets, and enables export to Python-compatible formats like NetworkX and Pandas. Built on Neo4j technology, NOCTIS features a modular, extensible architecture with all open-source dependencies. To enhance NOCTIS functionality, we developed a plugin -- NOCTIS Route Miner -- that implements an algorithm to extract all possible routes for a target from an NOC. Information about reaction routes, especially those that have been previously executed, is highly valuable for avoiding duplication of work and leveraging prior knowledge in synthesis planning. However, this information is often unavailable. Mining routes can help address this gap, though it comes with certain limitations: even for networks as small as 100 reactions, the number of routes can reach millions. This combinatorial explosion of routes makes mining computationally challenging and interpretation of raw results impossible. To demonstrate broader capabilities of NOCTIS, we analysed the MIT USPTO dataset[1]. We showcase the tool’s efficiency, versatility, and compatibility with Python workflows, providing a practical platform for reaction routes exploration, network connectivity analysis, and synthetic tree evaluation.
Conclusion:
NOCTIS builds on the ecosystem initiated by LinChemIn[2], providing an open and extensible platform for constructing and analyzing chemical reactions as networks and enabling extraction of synthetic routes. It supports collaborative research and community-driven contributions, fostering advancements in cheminformatics. Future developments will focus on expanding query types and optimizing the performance of route extraction. NOCTIS also sets the stage for AI-driven applications, leveraging its graph-based foundation to support advanced knowledge extraction, learning, and synthesis optimization.
[1] Jin, W.; Coley, C.; Barzilay, R.; Jaakkola, T. Predicting organic reaction outcomes with Weisfeiler–Lehman networks. Advances in Neural Information Processing Systems, 30 (2017). [2] Pasquini, M.; Stenta, M. LinChemIn: Route arithmetic operations on digital synthetic routes. Journal of Chemical Information and Modeling, 64 (6), 1765–1771 (2024).
Supplementary weblinks
Title
Noctis main repository
Description
Contains Python code and documentation of Noctis
Actions
View