Trainable Data Embeddings Enable Multi-Fidelity Learning

01 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We present an approach for end-to-end training of machine learning models for structure-property modeling on collections of datasets derived using different DFT functionals and basis sets. This approach overcomes the problem of data inconsistencies in the training of machine learning models on atomistic data. We rephrase the underlying problem as a multi-task learning scenario. We show that conditioning neural network-based models on trainable embedding vectors can effectively account for quantitative differences between methods. This allows for joint training on multiple datasets that would otherwise be incompatible. Therefore, this procedure circumvents the need for re-computations at a unified level of theory. Numerical experiments demonstrate that training on multiple reference methods enables transfer learning between tasks, resulting in even lower errors compared to training on separate tasks alone. Furthermore, we show that this approach can be used for multi-fidelity learning, improving data efficiency for the highest fidelity by an order of magnitude. To test scalability, we train a single model on a joint dataset compiled from 10 disjoint subsets of the MultiXC-QM9 dataset generated by different reference methods. Again, we observe transfer learning effects that improve the model errors by a factor of 2 compared to training on each subset alone.

Keywords

Multi-fidelity
Multi-task
Data embedding
MultiXC-QM9

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.