학술논문

Investigating differential abundance methods in microbiome data: A benchmark study.
Document Type
Article
Source
PLoS Computational Biology. 9/8/2022, Vol. 18 Issue 9, p1-33. 33p. 1 Diagram, 3 Charts, 10 Graphs.
Subject
*FALSE discovery rate
*FALSE positive error
*ECOLOGICAL niche
*SAMPLE size (Statistics)
*ECOSYSTEMS
Language
ISSN
1553-734X
Abstract
The development of increasingly efficient and cost-effective high throughput DNA sequencing techniques has enhanced the possibility of studying complex microbial systems. Recently, researchers have shown great interest in studying the microorganisms that characterise different ecological niches. Differential abundance analysis aims to find the differences in the abundance of each taxa between two classes of subjects or samples, assigning a significance value to each comparison. Several bioinformatic methods have been specifically developed, taking into account the challenges of microbiome data, such as sparsity, the different sequencing depth constraint between samples and compositionality. Differential abundance analysis has led to important conclusions in different fields, from health to the environment. However, the lack of a known biological truth makes it difficult to validate the results obtained. In this work we exploit metaSPARSim, a microbial sequencing count data simulator, to simulate data with differential abundance features between experimental groups. We perform a complete comparison of recently developed and established methods on a common benchmark with great effort to the reliability of both the simulated scenarios and the evaluation metrics. The performance overview includes the investigation of numerous scenarios, studying the effect on methods' results on the main covariates such as sample size, percentage of differentially abundant features, sequencing depth, feature variability, normalisation approach and ecological niches. Mainly, we find that methods show a good control of the type I error and, generally, also of the false discovery rate at high sample size, while recall seem to depend on the dataset and sample size. Author summary: The Microbiota is the set of microorganisms that characterize an ecological environment or niche. Several studies have shown that the microbiota is involved in various biological mechanisms that affect the health or balance of the host organism or the ecosystem. New discoveries and insights have been possible thanks to the increasingly efficient sequencing technologies together with the development of bioinformatic computational methods. One of the most interesting analyses in this landscape is the identification of microorganisms that show significant different abundances when two groups of subjects are analysed. Although many computational methods have been developed, it is still unclear which one has the best performance. Therefore, we exploited a simulator of microbiome data to build a simulation framework that allowed us to carry out an extensive benchmarking of the known tools of differential abundance analysis. Our work is not only a starting point to guide analysts in the choice of tools, but also a first step towards a robust, reliable and fair simulation framework. [ABSTRACT FROM AUTHOR]