A Chemical Language Model for Molecular Taste Prediction

Yoel  Zimmermann; Leif Sieben; Henrik  Seng; Philipp Pestlin; Franz Görlich

doi:10.26434/chemrxiv-2024-d6n15-v2

Agriculture and Food Chemistry

Search within Agriculture and Food Chemistry

A Chemical Language Model for Molecular Taste Prediction

11 December 2024, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Determining molecular taste remains a significant challenge in food science. Here, we present FART (Flavor Analysis and Recognition Transformer), a chemical language model capable of predicting molecular taste from chemical structure. Trained on the largest public dataset (15,025 compounds) of molecular tastants to date, FART is the first model capable of parallel predictions across four fundamental taste categories: sweet, bitter, sour, and umami. FART achieves an accuracy above 91% for parallel taste prediction and outperforms previous state-of-the-art binary classifier models that specialize on predicting one taste class. Its transformer architecture allows for interpretability through gradient-based visualization of molecular features. The model identifies key structural elements driving taste properties and demonstrates utility in analyzing known tastants as well as novel compounds. By making both the model and the dataset publicly available, we provide the food science community with tools for rapid taste prediction, potentially accelerating the development of new flavor compounds and enabling systematic exploration of taste chemistry.

Keywords

Machine Learning

Molecular taste prediction

Transformer model

Interpretable Artificial Intelligence

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Supplementary Information including Supplementary Methods, Supplementary Tables and Supplementary Figures.

Actions

Supplementary weblinks

Title

Description

Actions

Title

GitHub with Code and Data

Description

Link to GitHub where the code and data used in this work is available.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Dec 11, 2024 Version 2

Nov 19, 2024 Version 1

Version Notes

Manually corrected some entries in the dataset and re-trained all models again. Extended benchmarking to state-of-the-art classifiers. Re-worked introduction to make overall narrative clearer. Slightly adapted title for readability. We also apologize for a previous submission of this version which erroneously had listed the authors as anonymous.

Metrics

3,003

966

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2024-d6n15-v2

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

A Chemical Language Model for Molecular Taste Prediction

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share