FAME.AL: Site-of-metabolism prediction with active learning

03 October 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The ability to determine and predict metabolically labile atom positions in a molecule (also called “sites of metabolism” or “SoMs”) is of high interest to the design and optimization of bioactive compounds such as drugs, agrochemicals, and cosmetics. In recent years, several in silico models for SoM prediction have become available, many of which include a machine-learning component. The bottleneck in the further development of these approaches is the coverage of distinct atom environments and rare and complex biotransformation events with high-quality experimental data. In this context, active learning strategies could yield higher data efficiency and, in addition, provide guidance to experimentalists on which atom environments to investigate next for maximum information gain. Here we report on the development and validation of FAME.AL, an active learning approach for site-of-metabolism prediction that builds on the previously published FAst MEtabolizer (FAME 3). The active learning approach yielded competitive performance for phase 1 and phase 2 metabolism (Matthews correlation coefficients of approximately 0.50 on holdout data) while using only 20% of the training data used by classical modeling setups. Besides high performance and high data efficiency, the active learning approach is also characterized by high robustness and speed. The approach is largely invariant to starting conditions and parameters, and substantial speed-ups can be yielded by using small atom batches rather than individual atoms during the iterative model-building process. The source code of FAME.AL is publicly available.

Keywords

site-of-metabolism prediction
xenobiotic metabolism
drug metabolism
machine learning
active learning

Supplementary materials

Title
Description
Actions
Title
Supporting information for FAME.AL: Site-of-metabolism prediction with active learning
Description
Includes tables on (i) the 24 Sybyl atom types used by the CDPKit FAME descriptors and (ii) the CDPKit 2D descriptors and their CDK counterparts
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.