학술논문

multiomics: A user-friendly multi-omics data harmonisation R pipeline [version 2; peer review: 2 not approved]
Document Type
software-tool
Source
F1000Research. 10:538
Subject
Software Tool Article
Articles
machine learning
multi-omics
data integration
data harmonisation
multivariate analysis
Language
ISSN
2046-1402
Abstract
Data from multiple omics layers of a biological system is growing in quantity, heterogeneity and dimensionality. Simultaneous multi-omics data integration is of immense interest to researchers as it has potential to unlock previously hidden biomolecular relationships leading to early diagnosis, prognosis, and expedited treatments. Many tools for multi-omics data integration are developed. However, these tools are often restricted to highly specific experimental designs, types of omics data, and specific data formats. A major limitation of the field is the lack of a pipeline that can accept data in unrefined form to preserve maximum biology in an individual dataset prior to integration. We fill this gap by developing a flexible, generic multi-omics pipeline called multiomics , to facilitate general-purpose data exploration and analysis of heterogeneous data. The pipeline takes unrefined multi-omics data as input, sample information and user-specified parameters to generate a list of output plots and data tables for quality control and downstream analysis. We have demonstrated its application on a sepsis case study. We enabled limited checkpointing functionality where intermediate output is staged to allow continuation after errors or interruptions in the pipeline and generate a script for reproducing the analysis to improve reproducibility. Our pipeline can be installed as an R package or manually from the git repository, and is accompanied by detailed documentation with walkthroughs on three case studies.