학술논문

FPGA Accelerated INDEL Realignment in the Cloud
Document Type
Conference
Source
2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) HPCA High Performance Computer Architecture (HPCA), 2019 IEEE International Symposium on. :277-290 Feb, 2019
Subject
Computing and Processing
Genomics
Bioinformatics
Pipelines
Acceleration
Field programmable gate arrays
Cancer
Hardware
Computer Architecture, Microarchitecture, Accelerator Architecture, Hardware Specialization, Genomic Analytics, INDEL Realignment, FPGA Acceleration, FPGAs-as-a-service, Cloud FPGAs
Language
ISSN
2378-203X
Abstract
The amount of data being generated in genomics is predicted to be between 2 and 40 exabytes per year for the next decade, making genomic analysis the new frontier and the new challenge for precision medicine. This paper explores targeted deployment of hardware accelerators in the cloud to improve the runtime and throughput of immense-scale genomic data analyses. In particular, INDEL (INsertion/DELetion) realignment is a critical operation that enables diagnostic testings of cancer through error correction prior to variant calling. It is the slowest part of the somatic (cancer) genomic analysis pipeline, the alignment refinement pipeline, and represents roughly one-third of the execution time of time-sensitive diagnostics for acute cancer patients. To accelerate genomic analysis, this paper describes a hardware accelerator for INDEL realignment (IR), and a hardware-software framework leveraging FPGAs-as-a-service in the cloud. We chose to implement genomics analytics on FPGAs because genomic algorithms are still rapidly evolving (e.g. the de facto standard ""GATK Best Practices"" has had five releases since January of this year). We chose to deploy genomics accelerators in the cloud to reduce capital expenditure and to provide a more quantitative performance and cost analysis. We built and deployed a sea of IR accelerators using our hardware-software accelerator development framework on AWS EC2 F1 instances. We show that our IR accelerator system performed 81x better than multi-threaded genomic analysis software while being 32x more cost efficient.