학술논문

Extraction of Long k-mers Using Spaced Seeds

Document Type

Periodical

Author

Source

IEEE/ACM Transactions on Computational Biology and Bioinformatics IEEE/ACM Trans. Comput. Biol. and Bioinf. Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 19(6):3444-3455 Jan, 2022

Subject

Bioengineering
Computing and Processing
Bioinformatics
Solids
Sequential analysis
Tools
Genomics
Data structures
Task analysis
k-mers
k-mer counting
spaced seeds

Language

ISSN

1545-5963
1557-9964
2374-0043

Abstract

The extraction of $k$k-mers from reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These methods tend to be more accurate when the used $k$k-mers are unique in the analyzed DNA, and thus the use of longer $k$k-mers is preferred. When the read lengths of short read sequencing technologies increase, the error rate will become the determining factor for the largest possible value of $k$k. Here we propose LoMeX which uses spaced seeds to extract long $k$k-mers accurately even in the presence of sequencing errors. Our experiments show that LoMeX can extract long $k$k-mers from current Illumina reads with a similar or higher recall than a standard $k$k-mer counting tool. Furthermore, our experiments on simulated data show that when the read length further increases enabling even longer $k$k-mers, the performance of standard $k$k-mer counters declines, whereas LoMeX still extracts long $k$k-mers successfully.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송