Abstract
Hundreds of models for the prediction of small molecule retention times were published during the last decades. Our goal is the transferable prediction of retention times: Our method should predict retention times for a target dataset, without the need of training data from that chromatographic system. Unfortunately, retention times may change massively, even for nominally identical chromatographic conditions. Retention order is much better retained, yet even the retention order of compounds may change if chromatographic conditions vary. We present a machine learning model that can predict retention order or, more precisely, a retention order index, taking into account chromatographic conditions. We show how to map predicted retention order indices to retention times. Disentangling these two task finally enables transferable retention time prediction across chromatographic conditions and compound classes. Our 2-step method outperforms existing methods that were trained on the target dataset. Finally, we systematically study what chromatographic conditions result in notable changes of retention order.
Supplementary materials
Title
Supplementary Table 2. List of RepoRT datasets used for retention order statistics and model evaluation
Description
All datasets from RepoRT are listed, detailing in which evaluation scenario each dataset is used. Information on which datasets are missing important metadata (HSM and Tanaka parameters, pH, void volume estimate, column temperature, flow rate) are also provided. Datasets removed from evaluation following manual curation are specified.
Actions
Supplementary weblinks
Title
Code for model training, evaluation and application
Description
GitHub repository containing the code to train, evaluate and apply the 2-step retention time prediction models.
Actions
View