TCM-Navigator, a Deep Learning-based Workflow for Generation and Evaluation of Traditional Chinese Medicine-like Compounds for Drug Development

06 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Traditional Chinese Medicine (TCM) has long been regarded as a valuable resource for modern drug discovery. However, the limited availability of recorded entities and information, the complexity and sparsity of the herb–ingredient–target–disease network, and inconsistencies in data representation hinder the effectiveness of high-throughput screening approaches. While some therapeutically valuable compounds from TCM have been discovered through manual experimental screening, such methods are time-consuming and require substantial human resources. To address these challenges, we developed a data-driven and deep learning–based workflow, TCM-Navigator, that enables the in-silico generation, quality control, and physics-based evaluation of TCM-like molecules. The generation is done by TCM-Generator, a transfer learning- and LSTM-based chemical language model that generates standardized, hierarchically structured, and high-throughput–friendly datasets of TCM-like molecules. In this study, we generated a target-nonspecific dataset comprising 3.7 million TCM-like molecules, expanding the number of entities in existing TCM datasets by more than 100-fold. The workflow also enables flexible, goal-driven molecule generation customized for specific targets, yielding three target-specific datasets and multiple high-potential target-ligand pairs. The quality control is done by TCM-Identifier, the first quantitative model specifically designed to capture unique characteristics of TCM, using an AttentiveFP framework with Message Passing Neural Networks (MPNNs). TCM-Identifier is expected to serve as an essential evaluation and guidance tool for TCM-related drug development. Our workflow bridges cutting-edge data science—including deep learning—with biomedical research to tackle longstanding challenges in target identification and molecular design. Its adaptable framework is also transferable to interdisciplinary innovation beyond drug development.

Keywords

Traditional Chinese Medicine
TCM-Navigator
TCM-Generator
TCM-Identifier
deep learning
chemical language model

Supplementary materials

Title
Description
Actions
Title
Supplementary figures
Description
Supplementary Figures 1 to 5 Referenced in the Main Manuscript
Actions
Title
Supplementary Methods
Description
Supplementary Methods mentioned in the main text, including: Datasets, Compound Generation with TCM-Generator, Evaluation of Chemical Space, ADMET and Chemical Properties Analysis, TCM-Identifier, Molecular Docking, and Molecular Dynamics (MD) Simulation.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.