A Machine Learning-Guided Approach to Navigate the Substrate Activity Scope of Galactose Oxidase: Application in the Conversion of Pharmaceutically Relevant Bulky Secondary Alcohols

13 August 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Biocatalysis is increasingly being adopted in industry for producing important chemicals in a selective, efficient, and sustainable way. Engineering an enzyme can often confer it with an altered chemical scope, making it accessible to new and desirable chemistry. Identifying enzymes with the desired substrate specificity and activity, however, remains time-consuming and costly. Galactose oxidase (GOase) is a copper-dependent enzyme that coverts alcohols to their corresponding carbonyls, an important transformation in industrial synthesis. Here, we present a machine learning aided protocol to develop a catalytic activity prediction model (R2~0.7-0.9) for GOase based on a focused dataset of engineered GOase variants with activity toward bulky benzylic secondary alcohols. The trained GOase activity prediction models (with no additional training) also retained their predictive power when applied to another member of the oxidase family, an aryl-alcohol oxidase. Inspired by the fragment-based optimization methods used in drug discovery, we developed an active-site structure-aware substrate library for GOase. Experimental validation of a subset of the constructed substrate library indicates that the trained models provide good prediction (R2=0.61) of GOase activity, enabling the identification of the best GOase variant for each new substrate. This ability to identify optimal GOase variants for the synthesis of industrially important chemicals was demonstrated for Dyclonine, an FDA-approved drug. Our machine learning-guided approach enables rapid navigation of the substrate-activity scope of GOase, thereby reducing the burden of extensive experimental screening, and streamlining the deployment of biocatalysis in industrial synthesis.

Keywords

Galactose oxidase
alcohol oxidation
machine learning
substrate scope expansion
catalytic activity prediction
directed evolution
high throughput
drug biosynthesis
molecular modeling

Supplementary materials

Title
Description
Actions
Title
Supporting information
Description
Contains additional details of methods employed in this study along with supplementary figures and tables
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.