Integrating Genetic Algorithms and Language Models for Enhanced Enzyme Design

Yves Gaetan Nana Teukam; Federico Zipoli; Teodoro Laino; Emanuele Criscuolo; Francesca Grisoni; Matteo Manica

doi:10.26434/chemrxiv-2024-j7ntq

Catalysis

Search within Catalysis

Integrating Genetic Algorithms and Language Models for Enhanced Enzyme Design

13 March 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Enzymes are molecular machines optimized by nature to allow otherwise impossible chemical processes to occur. Their design is a challenging task due to the complexity of the protein space and the intricate relationships between sequence, structure, and function. Recently, large language models (LLMs) have emerged as powerful tools for modeling and analyzing biological sequences, but their application to protein design is limited by the high cardinality of the protein space. This study introduces a framework that combines LLMs with genetic algorithms (GAs) to optimize enzymes. LLMs are trained on a large dataset of protein sequences to learn relationships between amino acid residues linked to structure and function. This knowledge is then leveraged by GAs to efficiently search for sequences with improved catalytic performance. We focused on two optmization tasks: improving the feasibility of biochemical reactions and increasing their turnover rate. Systematic evaluations on 105 biocatalytic reactions demonstrated that the LLM-GA framework generated mutants outperforming the wild-type enzymes in terms of feasibility in 90% of the instances. Further in-depth evaluation of seven reactions reveals the power of this methodology to make `the best of both worlds' and create mutants with structural features and flexibility comparable to the wild types. Our approach advances the state-of-the-art computational design of biocatalysts, ultimately opening opportunities for more sustainable chemical processes.

Keywords

Protein Language Model

Enzyme design

Protein optimization

Supplementary weblinks

Title

Description

Actions

Title

Example File

Description

This repository provides an example on how to run the framework for the optimization of enzymes within the context of biocatalytic reactions.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 13, 2024 Version 1

Metrics

1,800

1,466

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-j7ntq

Funding

NCCR Catalysis

180544

European Research Council

101077879

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Integrating Genetic Algorithms and Language Models for Enhanced Enzyme Design

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share