학술논문

Identification of missing variants by combining multiple analytic pipelines

Document Type

article

Author

Yingxue Ren; Joseph S. Reddy; Cyril Pottier; Vivekananda Sarangi; Shulan Tian; Jason P. Sinnwell; Shannon K. McDonnell; Joanna M. Biernacka; Minerva M. Carrasquillo; Owen A. Ross; Nilüfer Ertekin-Taner; Rosa Rademakers; Matthew Hudson; Liudmila Sergeevna Mainzer; Yan W. Asmann

Source

BMC Bioinformatics, Vol 19, Iss 1, Pp 1-12 (2018)

Subject

Missing variants
Combining multiple bioinformatics pipelines
Rare variants
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5

Language

English

ISSN

1471-2105

Abstract

Abstract Background After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. Results We analyzed 10,000 exomes from the Alzheimer’s Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1–5%) and rare (MAF

Online Access

EBSCOHost PDF Full Text (Gale Academic Onefile) Full Text (ProQuest Central) Open Access (DOAJ) Open Access (BioMed Central) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송