학술논문

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.
Document Type
Article
Source
Bioinformatics. 6/15/2020, Vol. 36 Issue 12, p3879-3881. 3p.
Subject
*MELANOGENESIS
*FUNCTIONAL genomics
*PIPELINES
*TRANSCRIPTION factors
*CIS-regulatory elements (Genetics)
*SCALABILITY
*BIOINFORMATICS
Language
ISSN
1367-4803
Abstract
Summary We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. Availability and implementation SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. Contact lswang@pennmedicine.upenn.edu Supplementary information Supplementary data  are available at  Bioinformatics  online [ABSTRACT FROM AUTHOR]