PROTEIN IDENTIFICATION USING SEQUENCE DATABASES

Ye. Golenko; A. Ismailova; Ye. Rais

doi:10.37943/AITU.2020.91.98.002

Authors

Ye. Golenko S. Seifullin Agrotechnical University https://orcid.org/0000-0002-4643-4571
A. Ismailova S. Seifullin Agrotechnical University https://orcid.org/0000-0002-8958-1846
Ye. Rais S. Seifullin Agrotechnical University https://orcid.org/0000-0003-0097-8335

DOI:

https://doi.org/10.37943/AITU.2020.91.98.002

Keywords:

Mass Spectrometry, MS/MS, Bioinformatics, Protein Identification, Proteomics, Databases, Protein Sequence

Abstract

The bottom-up proteomics approach (also known as the shotgun approach), based on the digestion of proteins in peptides and their sequencing using tandem mass spectrometry (MS/MS), has become widespread. The identification of peptides from the obtained MS/MS data is most often done using available sequence databases. This paper presents a detailed overview of the peptide identification workflow and a description of the main protein bioinformatics databases. Choosing the correct search parameters and the sequence database is essential to the success of this method, and we pay special attention to the practical aspects of searching for efficient analysis of MS/MS spectra. We also consider possible reasons why database search tools cannot find the correct sequence for some MS/MS spectra and highlight the misidentification issues that can significantly reduce the value of published data. To help assess the assignment of peptides to MS/MS spectra, we will look at the scoring algorithms that are used in the most popular database search tools. We also analyze statistical methods and computational tools for validating peptide compliance with MS/MS data. The final part describes the process of determining the identity of protein samples from a list of peptide identifications and discusses the limitations of bottom-up proteomics.

Author Biographies

Ye. Golenko, S. Seifullin Agrotechnical University

Doctoral Student

A. Ismailova, S. Seifullin Agrotechnical University

PhD, Senior Lecturer

Ye. Rais, S. Seifullin Agrotechnical University

Master’s Student

References

Link, A.J., Eng, J., Schieltz, D.M., Carmack, E., Mize, G.J., Morris, D.R., ... & and Yates, J.R. (3). rd. (1999). Direct analysis of protein complexes using mass spectrometry. Nature Biotechnology, 17, 676-82. doi: 10.1038/10890.

Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., & Aebersold, R. (1999). Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnology, 17 (10), 994-999.

Washburn, M.P., Wolters, D., & Yates, J.R. (2001). Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotechnology, 19, 242-247.

Chung, T.W., & Tureček, F. (2010). Backbone and side-chain specific dissociations of z ions from nontryptic peptides. Journal of the American Society for Mass Spectrometry, 21, 1279-1295

Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E., & Pevzner, P.A. (1999). De novo peptide sequencing via tandem mass spectrometry. Journal of Computational Biology, 6 (3-4), 327-342.

Taylor, J.A., & Johnson, R.S. (2001). Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Analytical Chemistry, 73 (11), 2594-2604.

Chen, T., Kao, M.Y., Tepel, M., Rush, J., & Church, G.M. (2001). A dynamic programming approach to de novo sequencing via tandem mass spectrometry, Journal of Computational Biology, 8 (3), 325-337.

Ma, B., Zhang, K., Hendrie, C., et al. (2003). PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid communications in mass spectrometry, 17, 2337-2342.

Frank, A., & Pevzner, P. (2005). PepNovo: de novo peptide sequencing via probabilistic network modeling. Analytical chemistry, 77 (4), 964-973.

Berizovskaya, E.I., Ichalaynen A.A., Antochin A.M. (2015). Methods of processing mass spectrometry data for identification of the peptides and proteins. Vestnik Moskovskogo universiteta, 56, 266-278.

Chen, C., Huang, H., & Wu, C.H. (2017). Protein bioinformatics databases and resources. Protein Bioinformatics, 3-39.

Nesvizhskii, A.I. (2007). Protein Identification by Tandem Mass Spectrometry and Sequence Database Searching. Mass Spectrometry Data Analysis in Proteomics, 367, 87-119. doi: 10.1385/1-59745-275-0:87.

Aebersold, R. & Goodlett, D.R. (2001). Mass spectrometry in proteomics. Chemical Reviews, 101, 269-295.

Keller, A., Nesvizhskii, A.I., Kolker, E., & Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74, 5383-5392.

Tabb, D.L., Smith, L.L., Breci, L.A., Wysocki, V.H., Lin, D., & Yates, J.R. (2003). Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Analytical Chemistry, 75, 1155-1163.

Kapp, E.A., Schütz, F., Reid, G.E., Eddes, J.S., Moritz, R.L., O’Hair, R.A., ... & Simpson, R.J. (2003). Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation, Analytical Chemistry, 75, 6251-6254.

Resing, K.A., Meyer-Arendt, K., Mendoza, A.M., et al. (2004). Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics, Analytical Chemistry, 76, 3556-3568.

Keller, A., Nesvizhskii, A.I., Kolker, E., & Aebersold, R. (2002). ‘Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical chemistry, 74 (20), 5383-5392.

Avtonomov D., Agron I., Kononikhin A., Nikolaev E. (2009). Sozdaniye bazy dannykh tochnykh massovo-vremennykh metok dlya kachestvennogo i kolichestvennogo podkhoda v issledovanii proteoma mochi cheloveka s ispol‘zovaniyem izotopnogo mecheniya. Proceedings of Moscow Institute of Physics and Technology, 1 (1), 24-29.

Eng, J., McCormack, A., Yates, J. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 5, 976-989.

Yates, J.R., Eng, J.K., & McCormack, A.L. (1995). Mining Genomes: Correlating Tandem Mass Spectra of Modified and Unmodified Peptides to Sequences in Nucleotide Databases. Analytical Chemistry, 67 (18), 3202-3210. https://doi.org/10.1021/ac00114a016.

Lyutvinsky J. (2007). Method of recognition of amino acid sequences in mass spectra of peptides for proteomics problems. Dissertation.

Craig, R., Beavis, R.C. (2004). TANDEM: matching proteins with tandem mass spectra. Bioinformatics, 20 (9), 1466-1467.

Fenyö, D., Beavis, R.C., (2003). A method for assessing the statistical significance of mass spectrometrybased protein identifications using general scoring schemes. Analytical Chemistry, 75, 768-774.

Nesvizhskii, A.I., Keller, A., Kolker, E., & Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75, 4646-4658.

Rappsilber, J., & Mann, M. (2002). What does it mean to identify a protein in proteomics? Trends in Biochemical Sciences, 27 (2), 74-78.

PROTEIN IDENTIFICATION USING SEQUENCE DATABASES

Authors

DOI:

Keywords:

Abstract

Author Biographies

Ye. Golenko, S. Seifullin Agrotechnical University

A. Ismailova, S. Seifullin Agrotechnical University

Ye. Rais, S. Seifullin Agrotechnical University

References

Downloads

Published

How to Cite

Issue

Section

License