학술논문

IterCluster: a barcode clustering algorithm for long fragment read analysis
Document Type
Academic Journal
Source
PeerJ. March 24, 2020, Vol. 8 e8431
Subject
Chromium (Metal)
Algorithms
Genomes
Genomics
Genetic research
Cloud computing
Technology
Human genome
Novels
Biological sciences
Algorithm
Language
English
ISSN
2167-8359
Abstract
Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions, which results in inefficient assembly when using long-range information. Thus, methods to address these shortcomings are vital for capitalizing on the additional information obtained using these technologies. We therefore designed IterCluster, a novel, alignment-free clustering algorithm that can cluster barcodes from the same target region of a genome, using -mer frequency-based features and a Markov Cluster (MCL) approach to identify enough reads in a target region of a genome to ensure sufficient target genome sequence depth. The IterCluster method was validated using BGI stLFR and 10X Genomics chromium reads datasets. IterCluster had a higher precision and recall rate on BGI stLFR data compared to 10X Genomics Chromium read data. In addition, we demonstrated how IterCluster improves the de novo assembly results when using a divide-and-conquer strategy on a human genome data set (scaffold/contig N50 = 13.2 kbp/7.1 kbp vs. 17.1 kbp/11.9 kbp before and after IterCluster, respectively). IterCluster provides a new way for determining LFR barcode enrichment and a novel approach for de novo assembly using LFR data. IterCluster is OpenSource and available on https://github.com/JianCong-WENG/IterCluster.
Author(s): Jiancong Weng (1,2,*), Tian Chen (2,*), Yinlong Xie (2), Xun Xu (3), Gengyun Zhang (1), Brock A. Peters (3), Radoje Drmanac (3) Introduction The short read length of next-generation [...]