학술논문

Improved classification model for peptide identification based on self-paced learning
Document Type
Conference
Source
2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. :258-261 Nov, 2017
Subject
Bioengineering
Computing and Processing
Training
Optimization
Mathematical model
Benchmark testing
Databases
Peptides
Kernel
Language
Abstract
Post-database searching is a key procedure for peptide spectrum matches (PSMs) in protein identification with mass spectrometry-based strategies. Although many machine learning-based approaches have been developed to improve the accuracy of peptide identification, the challenge remains for improvement due to the poor quality of data samples. CRanker has shown its effectiveness and efficiency in terms of the number of identified PSMs compared with benchmark algorithms. However, it has two weaknesses: overfitting and instability on small-sized datasets. In this paper, we incorporate two new strategies into CRanker to tackle its weaknesses. First of all, we modify the CRanker model by using different weight parameters for the learning losses of decoy and target PSMs. Moreover, we employ self-paced learning in training process to help the classifier getting avoid of those incorrect PSMs. Experimental studies show the modified CRanker with new strategies is more stable than the original one and outperforms benchmark methods in terms of the number of identified PSMs at the same false discovery rates (FDRs).