학술논문

K-mer Based DNA Methylation Status Prediction Using Support Vector Machine
Document Type
Conference
Source
2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE) Electrical, Computer & Telecommunication Engineering (ICECTE), 2019 3rd International Conference on. :229-232 Dec, 2019
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
DNA
Feature extraction
Bioinformatics
Genomics
Support vector machines
Biological system modeling
Computational modeling
DNA methylation
genomic information
bioinformatics
CpG loci
DNA probes
feature selection
K-mer
f-score
SVM
Language
Abstract
Cancer, diabetics, cardiovascular diseases and some of rare diseases are occurred due to the modification in human genome. DNA methylation is one of such type of genomic change. When one or more methyl groups are added to the DNA molecule, it is termed as DNA methylation. It is very difficult and challenging to deal with DNA methylation because of having high dimentionality and noisiness of methylation dataset. In this research work, we have proposed a machine learning based computational model for predicting DNA methylation. Our model is highly involved with bioinformatics and genomic information to extract genomic features. All previous works regarding the prediction of DNA methylation were only based on the statistical tools. Our model has two phases including the feature extraction and selection phase and the classification phase. In feature extraction phase we have used k-mer bioinformatics algorithm to extract the most appropriate features. Then using some statistical tools, we have selected the most discriminating features. Finally, we have chosen SVM, a supervised machine learning algorithm to predict the DNA methylation status. Our model has produced a classification accuracy of 98.65%.