Abstract
Peptide-based therapeutics are increasingly coming to the forefront of biomedicine with their promise of high specificity and low toxicity. Although noncanonical residues can always be used, employing only the natural 20 residues restricts the chemical space to a finite dimension allowing for comprehensive in silico screening. Towards this goal, the dataset comprising all possible di, tri, and tetrapeptide combinations of the canonical residues has been previously reported. However, with increasing computational power, the comprehensive set of pentapeptides are now also feasible for screening as are the comprehensive set of cyclic peptides comprising four or five residues. Here, we provide both the complete and prefiltered libraries of all di, tri, tetra, and pentapeptide sequences from 20 canonical amino acids and their homodetic (N-to-C-terminal) cyclic homologues. The libraries in the FASTA, SMILES, and SDF-3D formats can be readily used for screening against protein targets. Access to this dataset will accelerate small peptide screening workflows and encourage their use in drug discovery campaigns. As a case study, the developed library was screened against SARS-CoV-2 Mpro to identify potential small peptide inhibitors.
Supplementary materials
Title
Sample list of pentapeptides
Description
Manipulatable excel file with a sample of the pentapeptides for trial filtering. Full files available in the online repositories.
Actions
Title
Supporting information
Description
Additional images and analyses of the case studies in the main article
Actions
Supplementary weblinks
Title
Useable complete databases for comprehensive peptide screening
Description
FASTA, SMILES, and SDF-3D files of all di, tri, tetra, and pentapeptides and their cyclic analogues ready for docking and use in virtual screens.
Actions
View