학술논문

Analysis of Short-read Aligners using Genome Sequence Complexity
Document Type
Conference
Source
2020 12th International Conference on Knowledge and Systems Engineering (KSE) Knowledge and Systems Engineering (KSE), 2020 12th International Conference on. :312-317 Nov, 2020
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Correlation
Computational modeling
Genomics
Prediction algorithms
Complexity theory
Bioinformatics
Testing
genome complexity
short-read alignment
genomic analysis
Language
Abstract
Next generation sequencing technologies have the capability to provide large numbers of short reads inexpensively and accurately. Researchers have proposed many different methods to align short reads to reference genomes. Nevertheless, long repeats, which are known to be abundant in eukaiyotic genomes, have caused considerable difficulty for genome assembly methods that rely on short-read alignment. Although a few researchers have studied sequence complexity of genomes in terms of repeats, none have quantitatively related such complexity to the difficulty of short read alignment and assembly. In this paper, we investigate several measures of genome sequence complexity with the goal of quantifying the difficulty of short read alignment Using genomic data from 17 different organisms and testing against 12 state-of-the-art short-read aligners, we found a very strong correlation between the performance of virtually all of these aligners and measures of genome sequence complexity. Further, we show how these measures might be used to analyze and predict the performance of aligners, and more importantly, select the best aligners for specific genomes.