학술논문

A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
Document Type
Report
Source
Nature Biotechnology. December 2021, Vol. 39 Issue 12, p1589, 8 p.
Subject
Belgium
United Kingdom
Language
English
ISSN
1087-0156
Abstract
A substantial fraction of the human genome displays high sequence similarity with at least one other genomic sequence, posing a challenge for the identification of somatic mutations from short-read sequencing data. Here we annotate genomic variants in 2,658 cancers from the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort with links to similar sites across the human genome. We train a machine learning model to use signals distributed over multiple genomic sites to call somatic events in non-unique regions and validate the data against linked-read sequencing in an independent dataset. Using this approach, we uncover previously hidden mutations in ~1,700 coding sequences and in thousands of regulatory elements, including in known cancer genes, immunoglobulins and highly mutated gene families. Mutations in non-unique regions are consistent with mutations in unique regions in terms of mutation burden and substitution profiles. The analysis provides a systematic summary of the mutation events in non-unique regions at a genome-wide scale across multiple human cancers. Cancer mutations in non-unique sequences are identified by machine learning on short-read data.
Author(s): Maxime Tarabichi [sup.1] [sup.2] , Jonas Demeulemeester [sup.1] [sup.3] , Annelien Verfaillie [sup.1] , Adrienne M. Flanagan [sup.4] [sup.5] , Peter Van Loo [sup.1] , Tomasz Konopka [sup.1] [sup.6] [...]