Abstract
Natural products have proven to be valuable, particularly in the fields of drug discovery and chemogenomics. Tandem mass spectrometry, along with reference mass spectral libraries, has been frequently used to assist the characterization of natural products present in unknown complex mixtures. As current spectral libraries only contain a small percentage of known natural products, their continual expansion is crucial for accurate molecular identification. However, doing so through experimental means is often expensive and time-consuming. This study explores the use of ab initio molecular dynamics simulations (AIMD) based on the lightweight GFN2-xTB semiempirical Hamiltonian, to generate mass spectra for small natural products molecules. Through this approach, more than 2,700 unique mass spectra were generated and analysed in relation to the Global Natural Products Social Molecular Networking (GNPS) database. This study found that AIMD performs relative well (mean cosine similarity score of 0.68), with improved performance observed in aromatic molecules but limitations found when applied to molecules with carboxylic acid groups. Other key findings relating to experimental and simulated conditions also led to several recommendations for future work in this area. Overall, AIMD proved to have huge potential to be used to develop a putative natural product mass spectral library.