학술논문

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
Document Type
Report
Source
Nature Biotechnology. March 2021, Vol. 39 Issue 3, p302, 7 p.
Subject
United Kingdom
Language
English
ISSN
1087-0156
Abstract
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing.sup.1,2 with continuous long-read or high-fidelity.sup.3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms. Assembly of haplotype-resolved human genomes is achieved by combining short and long reads.
Author(s): David Porubsky [sup.1] , Peter Ebert [sup.2] , Peter A. Audano [sup.1] , Mitchell R. Vollger [sup.1] , William T. Harvey [sup.1] , Pierre Marijon [sup.2] , Jana Ebler [...]