Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection

22 December 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Generating top-down tandem mass spectra (MS/MS) for complex mixtures of proteoforms has become possible through improvements in fractionation, on-line separation, dissociation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and peak matching being paired with diverse methods for scoring proteoform-spectral matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification through three distinct challenges. The first is identifying a large yield of PrSMs while controlling false discovery rate (FDR) in identifying thousands of proteoforms from complex cell lysates via four software workflows: ProSight Proteome Discoverer, TopPIC, Informed Proteomics, and pTop. The second is the deconvolution of data from both Thermo Orbitrap-class and Bruker maXis Q-TOF instruments to produce consistent precursor charge and mass determinations while generating fragment mass lists to optimize identification. The third attempts to detect diverse post-translational modifications (PTMs) in proteoforms from bovine milk and human ovarian tissue. The data demonstrate that existing software suites produce admirable sensitivity, in some cases identifying a third of collected MS/MS with FDR controlled below 2%; the overlap in these PrSMs, however, illustrates real value in searching data with multiple search engines. Differences among identification workflows seem to result from each search algorithm incorporating its own deconvolution algorithm. By transmitting deconvolution data from multiple deconvolution routes (Thermo Xtract, Bruker Auto MSn, Mascot Distiller, TopFD, and FLASHDeconv) to the downstream TopPIC search algorithm, we were able to detect common causes of deconvolution disagreement. The detection of PTMs was very inconsistent among search algorithms, with some workflows suggesting as little as 1% of PrSMs from bovine milk were singly-phosphorylated while other workflows found that 18% of PrSMs were singly-phosphorylated. Taken together, these results make a strong argument for top-down researchers to adopt a standard practice of analyzing each MS/MS experiment with at least two different search engines.

Keywords

Top-Down Proteomics
Proteoform
Post-Translational Modification
bioinformatics
Deconvolution
Identification algorithms

Supplementary materials

Title
Description
Actions
Title
Table S1
Description
LC-MS/MS experiments with MS/MS counts and charge distributions
Actions
Title
Table S2
Description
Agreement statistics for spectra identified by multiple search engines
Actions
Title
File S1
Description
Original search configurations, output files, and ProForma exports for each search
Actions
Title
Supporting Information
Description
Supporting information
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.