학술논문

Sequence-based Prediction of Antimicrobial Peptides with CatBoost Classifier
Document Type
Conference
Source
2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE) BIBE Bioinformatics and Bioengineering (BIBE), 2022 IEEE 22nd International Conference on. :217-220 Nov, 2022
Subject
Bioengineering
Computing and Processing
Signal Processing and Analysis
Systematics
Costs
Toxicology
Peptides
Biological system modeling
Antibiotics
Predictive models
antimicrobial peptide prediction
therapeutic peptide
disease
machine learning
bioinformatics
Language
ISSN
2471-7819
Abstract
Antimicrobial resistance is one of the most serious issue for human health. Compared to existing antibiotics, antimicrobial peptides have the advantage of efficient killing microbes and other pathogens without inducing drug resistance. Large-scale experimental methods to characterize AMPs require wet-lab resources and longer time. In silico prediction of AMP, on the other hand, is an attractive strategy to lower the cost and time in the discovery of new AMPs. In this study, we proposed a CatBoost model for AMP prediction. We included various features for numerical representation of peptides, and then employed a systematic approach to select 130 important features for our machine learning models. The CatBoost model achieves an accuracy, F1-score, MCC, and AUC of 0.758, 0.750, 0.518, and 0.831, respectively, for cross validation. For an independent test based on 188 peptide sequences, the proposed model achieves an accuracy, MCC, and AUC of 0.814, 0.632, and 0.884, respectively, all of which are the best compared to five state-of-art methods. Our model improves the MCC of five existing methods by 2.6% to 21.1%, and improves the AUC of them by 1.3% to 13.3%, respectively. The results demonstrate that our CatBoost model is capable of yielding reliable results, and can be of great help in discovering novel AMPs.