학술논문

SparkBeagle : Scalable Genotype Imputation from Distributed Whole-Genome Reference Panels in the Cloud

Document Type

Conference

Author

Maarala, Altti Ilari; Pärn, Kalle; Nuñez-Fontarnau, Javier; Heljanko, Keijo

Source

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. :1-8

Subject

big data
bioinformatics
computational genomics
distributed systems
genotyping
parallel computing

Language

English

Abstract

Massive whole-genome genotype reference panels now provide accurate and fast genotyping by imputation for high-resolution genome-wide association (GWA) studies. Imputation-assisted genotyping can increase the genomic coverage of genotypes and thus satisfy the resolution required in comprehensive GWA studies in a cost-effective manner. However, the imputation of missing genotypes from large reference panels is a compute-intensive process that requires high-performance computing (HPC). Although HPC uses extremely distributed and parallel computing, current imputation tools, and existing algorithms have not been developed to fully exploit the power of distributed computing. To this end, we have developed SparkBeagle, a scalable, fast, and accurate distributed genotype imputation tool based on popular Beagle software. SparkBeagle is designed for HPC and cloud computing environments and it is implemented on top of the Apache Spark distributed computing framework. We have carried out scalability experiments by imputing 64,976,316 variants of 2504 samples from the 1000 Genomes reference panel in the cloud. SparkBeagle shows near-linear scalability while increasing the number of computing nodes. A speedup of 30x was achieved with 40 nodes. The imputation time of the whole data set decreased from 565 minutes to 18 minutes compared to a single node parallel execution. Near identical imputation accuracy was measured in the concordance analysis between the original Beagle and the distributed SparkBeagle tool.

Online Access

Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송