학술논문

EnDL-HemoLyt: Ensemble Deep Learning-Based Tool for Identifying Therapeutic Peptides With Low Hemolytic Activity
Document Type
Periodical
Source
IEEE Journal of Biomedical and Health Informatics IEEE J. Biomed. Health Inform. Biomedical and Health Informatics, IEEE Journal of. 28(4):1896-1905 Apr, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Peptides
Deep learning
Standards
Amino acids
Classification algorithms
Bioinformatics
Artificial intelligence
bidirectional long short-term memory
bioinformatics
convolutional neural networks
deep learning
hemolytic
therapeutic
temporal convolutional networks
Language
ISSN
2168-2194
2168-2208
Abstract
Low hemolytic therapeutic peptides have gained an edge over small molecule-based medicines. However, finding low hemolytic peptides in laboratory is time-consuming, costly and necessitates the use of mammalian red blood cells. Therefore, wet-lab researchers often perform in-silico prediction to select low hemolytic peptides before proceeding with in-vitro testing. The in-silico tools available for this purpose have following limitations: (i) They do not provide predictions for peptides having N/C terminal modifications. (ii) Data is food for AI; however, datasets used to create existing tools do not contain peptide data generated over past eight years. (iii) Performance of available tools is also low. Therefore, a novel framework has been proposed in current work, which utilizes recent dataset and uses ensemble learning technique to combine the decisions produced by bidirectional long short-term memory, bidirectional temporal convolutional network, and 1-dimensional convolutional neural network deep learning algorithms. Deep learning algorithms are capable of extracting features themselves from data. However, instead of relying solely on deep learning-based features (DLF), handcrafted features (HCF) were also provided so that deep learning algorithms can learn features that are missing from HCF, and a better feature vector can be constructed by concatenating HCF and DLF. Additionally, ablation studies were carried out to understand the roles of an ensemble algorithm, HCF, and DLF in the proposed framework. Ablation studies found that the ensemble algorithm, HCF and DLF are crucial components of proposed framework, and there is a decrease in performance on eliminating any of them. Mean value of performance metrics, namely $A_{cc}$, $S_{n}$, $P_{r}$, $F_{s}$, $S_{p}$, $B_{a}$, and $M{cc}$ obtained by proposed framework for test data is $\approx$ 87, 85, 86, 86, 88, 87, and 73, respectively. To aid scientific community, model developed from proposed framework has been deployed as a web server at https://endl-hemolyt.anvil.app/.