Abstract
Natural products (NPs), as a vital source of pharmaceutical agents, have contributed to the development of 60% of marketed small-molecule drugs. However, NP-based drug discovery faces a major challenge due to the combinatorial expansion of NPs' configurational space and their complex 3D-structures, which arise from atomic chirality dictated by stereospecific biosynthetic enzymes. To date, over 20% of known NPs lack complete chiral configuration annotations, and only 1–2% have fully resolved crystal structures. To address this bottleneck, we present NatGen, an innovative deep learning framework for predicting the chiral configurations and 3D conformations of natural products. NatGen leverages advanced structure augmentation and generative modeling techniques and achieves near-perfect accuracy in chiral configuration prediction: 96.87% on benchmark NP structural dataset and 100% in a prospective study involving 17 recently resolved plant-derived natural products. The average root-mean-square deviation (RMSD) of the predicted 3D structures is below 1 Å—smaller than the radius of a single atom. Using NatGen, we successfully predicted the 3D structures of 684,619 NPs from COCONUT - the largest open NP repository to date - and made the full dataset publicly available at https://www.lilab-ecust.cn/natgen/. We believe this resource significantly expands the structural landscape of natural products and will empower researchers to cross-validate findings and accelerate progress in diverse fields including natural product chemistry, enzymatic biosynthesis, physical, organic and analytical chemistry, phytochemistry, NP and NP-derived drug discovery.
Supplementary materials
Title
Supporting Information For Accurate Structure Prediction of Natural Products with NatGen
Description
A PDF document containing detailed experimental data of the NatGen study along with crystallographic analysis results.
Actions