Abstract
Comparing multiple label-free shotgun proteomics datasets requires various data processing and formatting steps, including peptide-spectrum matching, protein inference, and quantification. Finally, the compilation of results files into a format that allows for downstream analyses. ProtyQuant performs protein inference and quantification calculations, and combines the results of individual datasets into plain text tables. These are lightweight, human-readable, and easy to import into databases or statistical software. ProtyQuant reads validated pepXML from proteomic workflows such as the Trans-Proteomic Pipeline (TPP), which makes it compatible with many commercial and free search engines. For protein inference and quantification, a modified version of the PIPQ program (He et al. 2016) was integrated. In contrast to simple spectral-counting, PIPQ sums up peptide probabilities. For assigning peptides to proteins, three algorithms are available: Multiple Counting, Equal Division, and Linear Programming. The accumulated peptide probabilities (app) are used for both tasks, protein probability estimation, and quantification. ProtyQuant was tested using a reference dataset for label-free shotgun proteomics, obtained from different concentrations of 48 human UPS proteins spiked into yeast lysate. Compared to ProteinProphet, ProtyQuant detected up to 126 (15%) more proteins in the mixture, applying an equal false positive rate (FPR). Using the app values for label-free quantification showed suitable sensitivity and linearity. Strikingly, the app values represent a realistic measure of ‘Protein Presence,’ an integral concept of protein probability and quantity. ProtyQuant provides a graphical user interface (GUI) and scripts for console-based processing. It is available (GNU GLP v3) for Windows, Linux, and Docker from https://bitbucket.org/lababi/protyquant/.