학술논문

A Fully Integrated End-to-End Genome Analysis Accelerator for Next-Generation Sequencing
Document Type
Conference
Source
2023 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2023 IEEE International. :44-46 Feb, 2023
Subject
Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
Sequential analysis
Data analysis
Pipelines
DNA
Genomics
Medical diagnosis
Computational complexity
Language
ISSN
2376-8606
Abstract
Next-generation sequencing (NGS) has revolutionized biological sciences and clinical practices. It has become an essential technology for various emerging applications, such as cancer screening and inherited disease diagnosis. Fig. 2.4.1 shows an overview of an NGS pipeline. An NGS pipeline includes sample preparation, sequencing, data analysis and tertiary analysis. A sequencer first generates a massive amount of DNA segments (short-reads) from samples. Short-reads are used as the inputs for data analysis. The outputs (genetic variants) of the data analysis can then be sent to facilities for further tertiary analysis. The data analysis is very time consuming and has become the bottleneck in the entire NGS pipeline [1]. The high computational complexity comes from hundreds of millions of short-reads for reconstructing a DNA sequence with three billion nucleotides. A complete data analysis workflow includes four steps: short-read mapping, haplotype calling, variant calling and genotyping. Data analysis accelerators have been proposed to reduce the processing time [2] [3]. They support the first three steps of the workflow, but genotyping, the dominant step [4], is not supported. Additionally, only the single-end-based short-read mapping is adopted in previous works so that the achieved analysis accuracy is limited. This work presents a fully integrated data analysis accelerator that handles the complete analysis workflow. Mapping with paired-end short-reads along with rescue is utilized to enhance the analysis accuracy.