Abstract
Biocatalysis is increasingly being adopted in industry for producing important chemicals in a selective, efficient, and sustainable way. Engineering an enzyme can often confer it with an altered chemical scope, making it accessible to new and desirable chemistry. Identifying enzymes with the desired substrate specificity and activity, however, remains time-consuming and costly. Galactose oxidase (GOase) is a copper-dependent enzyme that coverts alcohols to their corresponding carbonyls, an important transformation in industrial synthesis. Here, we present a machine learning aided protocol to develop a catalytic activity prediction model (R2~0.7-0.9) for GOase based on a focused dataset of engineered GOase variants with activity toward bulky benzylic secondary alcohols. The trained GOase activity prediction models (with no additional training) also retained their predictive power when applied to another member of the oxidase family, an aryl-alcohol oxidase. Inspired by the fragment-based optimization methods used in drug discovery, we developed an active-site structure-aware substrate library for GOase. Experimental validation of a subset of the constructed substrate library indicates that the trained models provide good prediction (R2=0.61) of GOase activity, enabling the identification of the best GOase variant for each new substrate. This ability to identify optimal GOase variants for the synthesis of industrially important chemicals was demonstrated for Dyclonine, an FDA-approved drug. Our machine learning-guided approach enables rapid navigation of the substrate-activity scope of GOase, thereby reducing the burden of extensive experimental screening, and streamlining the deployment of biocatalysis in industrial synthesis.
Supplementary materials
Title
Supporting information
Description
Contains additional details of methods employed in this study along with supplementary figures and tables
Actions