Abstract
Umami, a fundamental human taste modality, refers to the savory flavors in meats and broths, often associated with monosodium glutamate and protein richness. With limited knowledge of umami molecules, the food industry seeks efficient approaches for identifying novel tastants. In this study, we have devised a virtual screening pipeline for identifying potential novel umami tastants from molecular databases. We first curated a comprehensive classification dataset containing 439 umami and 428 non-umami molecules. A transformer-based architecture was trained to differentiate between the two classes, achieving the best performance to date. Additionally, we built a neural network model for predicting the potency of umami compounds, the first effort of its kind. These two models, in conjunction with similarity analysis and toxicity screening, form an end-to-end framework for the rational discovery of novel tastants. We finally applied this framework to the FooDB database as an illustrative use case. This study demonstrates the potential of data-driven methods in predicting the taste of molecules from structural and chemical features.