학술논문

High-sensitivity pattern discovery in large, paired multiomic datasets.
Document Type
Article
Source
Bioinformatics. 2022 Supplement, Vol. 38, pi378-i385. 8p.
Subject
*FALSE discovery rate
*GENE regulatory networks
*GENE expression profiling
*HUMAN phenotype
*STATISTICAL power analysis
*GENE expression
Language
ISSN
1367-4803
Abstract
Motivation Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control. Results Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes. Availability and implementation An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]