학술논문

A Genocentric Approach to Discovery of Mendelian Disorders
Document Type
Author abstract
Report
Source
American Journal of Human Genetics. Nov 7, 2019, Vol. 105 Issue 5, 974
Subject
Usage
Diagnosis
Genetic aspects
Developmental disabilities -- Diagnosis
Developmental disabilities -- Genetic aspects
DNA sequencing -- Usage
Child development deviations -- Diagnosis
Child development deviations -- Genetic aspects
Nucleotide sequencing -- Usage
Language
English
ISSN
0002-9297
Abstract
Keywords genotype-first; whole-exome sequencing; clan genomics; Mendelian disease; big data; Hadoop; data lake; developmental disorder; HARLEE; ultra-rare The advent of inexpensive, clinical exome sequencing (ES) has led to the accumulation of genetic data from thousands of samples from individuals affected with a wide range of diseases, but for whom the underlying genetic and molecular etiology of their clinical phenotype remains unknown. In many cases, detailed phenotypes are unavailable or poorly recorded and there is little family history to guide study. To accelerate discovery, we integrated ES data from 18,696 individuals referred for suspected Mendelian disease, together with relatives, in an Apache Hadoop data lake (Hadoop Architecture Lake of Exomes [HARLEE]) and implemented a genocentric analysis that rapidly identified 154 genes harboring variants suspected to cause Mendelian disorders. The approach did not rely on case-specific phenotypic classifications but was driven by optimization of gene- and variant-level filter parameters utilizing historical Mendelian disease-gene association discovery data. Variants in 19 of the 154 candidate genes were subsequently reported as causative of a Mendelian trait and additional data support the association of all other candidate genes with disease endpoints.