GPT-3 accurately predicts antimicrobial peptide activity and hemolysis

01 June 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Antimicrobial peptides (AMPs) have gained significant attention in the field of drug discovery due to their potential therapeutic applications in the fight against antimicrobial resistance. Since rationally designing AMPs is notoriously difficult due to the vast number of possible peptide sequences and their complex structure-activity relationship landscape, this problem is ideally suited for machine-learning models, which can be trained from available data to predict new sequences with a desired activity profile. Here we investigated the performance of large language models (LLMs) fine-tuned with data from Database of Antimicrobial Activity and Structure of Peptides (DBAASP) to predict AMP antimicrobial activity and hemolysis from their amino acid sequence. We show that GPT-3 based models perform slightly better than previously reported recurrent neural networks (RNN) and related architectures on comparable datasets. Furthermore, GPT-3 based models perform remarkably well on low data regime. Advantages in terms of training time and costs are also discussed.


large language models
activity prediction
antimicrobial peptides

Supplementary weblinks


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.