Abstract
Generating top-down tandem mass spectra (MS/MS) for complex mixtures of proteoforms has become possible through improvements in fractionation, on-line separation, dissociation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and peak matching being paired with diverse methods for scoring proteoform-spectral matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification through three distinct challenges. The first is identifying a large yield of PrSMs while controlling false discovery rate (FDR) in identifying thousands of proteoforms from complex cell lysates via four software workflows: ProSight Proteome Discoverer, TopPIC, Informed Proteomics, and pTop. The second is the deconvolution of data from both Thermo Orbitrap-class and Bruker maXis Q-TOF instruments to produce consistent precursor charge and mass determinations while generating fragment mass lists to optimize identification. The third attempts to detect diverse post-translational modifications (PTMs) in proteoforms from bovine milk and human ovarian tissue. The data demonstrate that existing software suites produce admirable sensitivity, in some cases identifying a third of collected MS/MS with FDR controlled below 2%; the overlap in these PrSMs, however, illustrates real value in searching data with multiple search engines. Differences among identification workflows seem to result from each search algorithm incorporating its own deconvolution algorithm. By transmitting deconvolution data from multiple deconvolution routes (Thermo Xtract, Bruker Auto MSn, Mascot Distiller, TopFD, and FLASHDeconv) to the downstream TopPIC search algorithm, we were able to detect common causes of deconvolution disagreement. The detection of PTMs was very inconsistent among search algorithms, with some workflows suggesting as little as 1% of PrSMs from bovine milk were singly-phosphorylated while other workflows found that 18% of PrSMs were singly-phosphorylated. Taken together, these results make a strong argument for top-down researchers to adopt a standard practice of analyzing each MS/MS experiment with at least two different search engines.
Supplementary materials
Title
Table S1
Description
LC-MS/MS experiments with MS/MS counts and charge distributions
Actions
Title
Table S2
Description
Agreement statistics for spectra identified by multiple search engines
Actions
Title
File S1
Description
Original search configurations, output files, and ProForma exports for each search
Actions
Title
Supporting Information
Description
Supporting information
Actions