학술논문

precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions
Document Type
Source
Subject
DNA
variant
short-read sequencing
long-read sequencing
benchmark
Naturvetenskap
Biologi
Bioinformatik och systembiologi
Natural Sciences
Biological Sciences
Bioinformatics and Systems Biology
Data- och informationsvetenskap (Datateknik)
Bioinformatik (beräkningsbiologi)
Computer and Information Science
Bioinformatics (Computational Biology)
Medicin och hälsovetenskap
Medicinska och farmaceutiska grundvetenskaper
Medicinsk genetik
Medical and Health Sciences
Basic Medicine
Medical Genetics
Language
English
Abstract
The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.