Abstract
Macrocyclic compounds have shown great potential as therapeutic agents due to their distinctive structural and pharmacological properties. However, structural optimization of macrocycles—a critical step in the rational design of macrocyclic drugs—remains constrained by the limited availability of bioactive candidates, which in turn hampers the systematic exploration of structure-activity relationships. In this paper, we introduce CycleGPT, a generative chemical language model designed specifically to address these challenges. CycleGPT is characterized by a progressive transfer learning paradigm to incrementally transfer knowledge from pre-trained chemical language models to specialized macrocycle generation to overcome the data shortage issue; in the meantime, it adopts an innovative probabilistic sampling strategy that effectively improves the structural novelty of generated macrocycles while ensuring domain-specific adaptability. In a prospective drug design based on CycleGPT and our custom JAK2 activity predictive model, three synthesized macrocycles exhibited high inhibitory activity against JAK2, with the most potent compound 2 showing an IC₅₀ of 1.17 nM. Moreover, compound 2 exhibited a favorable selectivity profile and demonstrated in vivo efficacy in polycythemia mice model. These novel therapeutic candidates demonstrate the significant potential of CycleGPT for advancing macrocyclic drug discovery.
Supplementary materials
Title
Exploring the macrocyclic chemical space for heuristic drug design with deep learning models
Description
Supplementary materials of Exploring the macrocyclic chemical space for heuristic drug design with deep learning models
Actions