Abstract
The design and optimization of chiral ligands and catalysts are crucial for advancing asymmetric catalysis, a key area of study within organic chemistry that profoundly influences various scientific disciplines. Traditional experimental methodologies, while foundational, are often constrained by their slow pace and high complexity. Recent advancements have shown that computational approaches, particularly machine learning (ML), can significantly accelerate these processes, offering a more efficient route via enhanced prediction and modeling capabilities. However, challenges such as data scarcity and inaccuracies continue to impede the effectiveness of computational models. This paper introduces the Chiral Ligand and Catalyst Database (CLC-DB), the first open-source, comprehensive database specifically tailored for chiral ligands and catalysts, to support these computational endeavors. CLC-DB houses 1861 molecules across diverse fundamental chiral categories, distributed among 32 distinct types of chiral ligands and catalysts. A total of 19 types of information are included for each data record, and each record is linked with authoritative chemical databases and validated by chemical experts. The database also provides a user-friendly interface that supports both quick and batch searches, alongside an effective online molecular clustering tool for computational analysis. CLC-DB is freely accessible at https://compbio.sjtu.edu.cn/services/clc-db, where all data are available for download.