Regularized indirect learning improves phage display ligand discovery

28 July 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Phage display is commonly employed for the discovery of high affinity ligands to biomolecular targets. However, ranking the discovered ligands for their affinity and specificity to the target is obscured by genetic amplification bias and amplification of target-unrelated phage, resulting in inefficient experimental validation and potentially intractable discovery. Here, we describe the use of indirect machine learning (ML) to improve the efficient discovery of target-specific peptide ligands from next-generation sequencing (NGS) data. We combine peptide sequence information (input) with experimental fitness scores (output) of the individual peptide performance across the rounds of bio-panning in a bidirectional long short-term memory (BiLSTM) architecture. Because the fitness scores contain bias, we use regularization to facilitate limited indirect learning and effectively process the peptide sequence information, while still using the predicted fitness scores to rank the peptides. Peptides containing high-affinity binding motifs to our target were ranked by the regularized model more than threefold higher, compared to any combination of experimental fitness scores. Baseline models of random forest (RF) and -nearest neighbor (KNN) demonstrated slightly lower performance but also demonstrated the importance of regularization. However, the BiLSTM model emerged as the most robust, as it was less sensitive to the peptide representation and the specific fitness score used. Shapley residue analysis generated interpretable structure-activity-relationship (SAR) by providing insight into predicted affinity-driving residues and physicochemical properties across the entire peptide and as well as at motif-specific positions. We expect that this approach will elucidate high-affinity ligands against a multitude of targets, vastly improving the discovery capability of phage display.


next-generation sequencing
affinity selection
phage display
ligand discovery

Supplementary materials

Supporting Information
Materials and methods, automated flow synthesis and characterization data.


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.