Abstract
The ability to determine and predict metabolically labile atom positions in a molecule (also called “sites of metabolism” or “SoMs”) is of high interest to the design and optimization of bioactive compounds such as drugs, agrochemicals, and cosmetics. In recent years, several in silico models for SoM prediction have become available, many of which include a machine-learning component. The bottleneck in the further development of these approaches is the coverage of distinct atom environments and rare and complex biotransformation events with high-quality experimental data. In this context, active learning strategies could yield higher data efficiency and, in addition, provide guidance to experimentalists on which atom environments to investigate next for maximum information gain. Here we report on the development and validation of FAME.AL, an active learning approach for site-of-metabolism prediction that builds on the previously published FAst MEtabolizer (FAME 3). The active learning approach yielded competitive performance for phase 1 and phase 2 metabolism (Matthews correlation coefficients of approximately 0.50 on holdout data) while using only 20% of the training data used by classical modeling setups. Besides high performance and high data efficiency, the active learning approach is also characterized by high robustness and speed. The approach is largely invariant to starting conditions and parameters, and substantial speed-ups can be yielded by using small atom batches rather than individual atoms during the iterative model-building process. The source code of FAME.AL is publicly available.
Supplementary materials
Title
Supporting information for FAME.AL: Site-of-metabolism prediction with active learning
Description
Includes tables on (i) the 24 Sybyl atom types used by the CDPKit FAME descriptors and (ii) the CDPKit 2D descriptors and their CDK counterparts
Actions