학술논문

Metagenomic Binning based on Unsupervised Extreme Learning Machine
Document Type
Conference
Source
2023 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON) Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), 2023 IEEE CHILEAN Conference on. :1-6 Dec, 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Measurement
Machine learning algorithms
Extreme learning machines
Pipelines
Clustering algorithms
DNA
Information and communication technology
Metagenomic Binning
US-ELM
Clustering
kmers
GC content
Language
ISSN
2832-1537
Abstract
Metagenomics studies the genetic information of microbial communities in different contexts. As metagenomic DNA is often fragmented and then sequenced into small reads, these reads can be assembled into longer sequences called contigs. An important step in the metagenomic analysis pipeline is Binning, which corresponds to the classification (supervised) or clustering (unsupervised) of reads or contigs. In the case of unsupervised Binning, several Machine Learning algorithms that use DNA sequence descriptors, such as k-mers Frequency and GC Content to perform clustering, have been employed. This paper proposes the use of Unsupervised Extreme Learning Machines (US-ELM) for Metagenomic Binning. The experiments use three datasets with different numbers of species present, and compare the results obtained by US-ELM with respect to the k-means and Maximization Expectation (ME) algorithms. The performance comparison employed metrics widely used in the problem, such as Accuracy, Rand�s index, and Clustering Computation Time. From the experiments, we can see that USELM windenly outperforms the other two clustering methods in accuracy. In terms of computational cost, US-ELM is comparable to k-means, and both algorithms are much faster than EM. Numerical results show the interesting potential of the US-ELM algorithm in the metagenomic binning problem.