학술논문

Prediction of Amyloid Proteins Using Embedded Evolutionary & Ensemble Feature Selection Based Descriptors With eXtreme Gradient Boosting Model
Document Type
Periodical
Source
IEEE Access Access, IEEE. 11:39024-39036 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Proteins
Amino acids
classification algorithms
Feature extraction
Predictive models
Computational modeling
Computer science
Amyloid proteins
K– separated bigrams
eXtreme gradient boosting
filter-position specific scoring matrix
ensemble feature selection
classification
Language
ISSN
2169-3536
Abstract
Amyloid proteins (AMYs) are usually an aggregate of insoluble fibrous that have major pathogenic effects on various tissues. However, its abnormal deposition may lead to several diseases i.e., Parkinson’s, Alzheimer’s, and type 2 diabetes. In addition, AMYs form amyloid aggregates when they are in a misfolded state. Therefore, it is crucial to accurately predict AMYs and their pathogenic characteristics. Various computational predictors have been presented for the accurate prediction of AMYs. Although, the effectiveness of these predictors is unsatisfactory due to their low generalization abilities and high training cost. In this attempt, we proposed an intelligent computational predictor for the accurate prediction of AMYs. The novel embedded evolutionary features are gathered using K-separated bigrams, and the Filter method into the evolutionary descriptors. Moreover, DDE-based enhanced frequency coupling information are gathered from the Amyloid sequences. Additionally, a multi-model vector is obtained by combining the features of the applied formulation techniques. To reduce the computational cost of the proposed model, the eXtreme Gradient Boosting-Recursive Feature Elimination (XGB-RFE) based high-ranked features are selected from the heterogeneous vector. In the next part, the optimal features are evaluated via several learners, i.e., XGBoost (XGB), Light Gradient Boosted Machine (LGBM), Support Vector Machine (SVM), Adaboost (ada), and Extra Trees classifier (ETC),. The proposed model reported an improved predictive prediction accuracy of 93.10% using training sequences and 89.67% using independent sequences, respectively. Which is $\sim 4$ % higher training accuracy than existing predictors. It is anticipated that our predictive approach will be useful for scientists and might play a key role in drug development and academic research.