The prediction of enzyme activity in a general extend is maybe one of the main challenges nowadays in catalysis. Computer-assisted methods have been proven to be able to simulate the reaction mechanism at the atomic level of detail. However, these methods tend to be expensive to be used in a large scale as it is needed in protein engineering campaigns. To alleviate this situation, machine learning methods can help in the generation of predictive-decision models. Herein we train different regression algorithms for the prediction of the reaction energy barrier of the rate-limiting step of the hydrolysis of mono-(2-hydroxyethyl)terephthalic acid by the MHETase of Ideonella sakaiensis. As training data set we use steered QM/MM MD simulation snapshots and their corresponding pulling work values. We have explored three algorithms together with three chemical representations. As outcome, our trained models are able to predict pulling works along the steered QM/MM MD simulations with a mean absolute error below 3 kcal mol-1 and a score value above 0.90. More challenging is the prediction of the energy maximum with a single geometry. Whereas the use of the initial snapshot of the QM/MM MD trajectory as input geometry yields a very poor prediction of the reaction energy barrier, the use of an intermediate snapshot of the former trajectory brings the score value above 0.40 with a low mean absolute error (ca. 3 kcal mol-1). Altogether, in this work we have faced some initial challenges of the final goal of getting an efficient workflow for the semi-automatic prediction of enzyme-catalyzed energy barriers and catalytic efficiencies.
Supporting information: Prediction of enzyme catalysis by computing reaction energy barriers via steered QM/MM Molecular Dynamics Simulations and Machine Learning