학술논문

Clustering and Classification of Human Microbiome Data: Evaluating the Impact of Different Settings in Bioinformatics Workflows
Document Type
Conference
Source
2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE) Bioinformatics and Bioengineering (BIBE), 2019 IEEE 19th International Conference on. :838-845 Oct, 2019
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Databases
Clustering algorithms
Radio frequency
Machine learning algorithms
Stability analysis
Bioinformatics
Sequential analysis
bioinformatics
microbiome
machine learning
16s rRNA
OTU table
Language
ISSN
2471-7819
Abstract
Microbiome studies are attracting increasing interest, especially in human health applications, where their use for disease prognostics, diagnostics and treatment can have immense effects on life quality. The settings in the microbiome data preprocessing stage can lead to the great variability of the generated operational taxonomic unit (OTU) tables, reflected in the size and sparseness of this data matrix. As there are still no solid guidelines on the best practices, it is valuable to assess which machine learning algorithms provide higher stability of results under variable preprocessing settings. In this study, we have generated OTU tables using data from the Moving pictures of human microbiome study using two different reference databases (Greengenes and Silva) and four levels of the similarity threshold (ranging from 90 to 99%), processed in the QIIME bioinformatics package. The results of the two best-performing classification and clustering algorithms are presented in detail: Random Forest classifier (RF) and Spectral clustering (SC). The random forest classifier has outperformed spectral clustering in terms of accuracy. As the rate of data generation increases, while the cost of labeling remains high, further improvement of clustering performance and ensemble approaches should be explored.