학술논문

SummaryAUC: a tool for evaluating the performance of polygenic risk prediction models in validation datasets with only summary level statistics.
Document Type
article
Source
Bioinformatics. 35(20)
Subject
Mathematical Sciences
Biological Sciences
Statistics
Genetics
Human Genome
Prevention
Aetiology
2.5 Research design and methodologies (aetiology)
Good Health and Well Being
Genome-Wide Association Study
Genotype
Humans
Multifactorial Inheritance
Polymorphism
Single Nucleotide
Schizophrenia
Molecular Genetics of Schizophrenia Consortium
Information and Computing Sciences
Bioinformatics
Biological sciences
Information and computing sciences
Mathematical sciences
Language
Abstract
MOTIVATION:Polygenic risk score (PRS) methods based on genome-wide association studies (GWAS) have a potential for predicting the risk of developing complex diseases and are expected to become more accurate with larger training datasets and innovative statistical methods. The area under the ROC curve (AUC) is often used to evaluate the performance of PRSs, which requires individual genotypic and phenotypic data in an independent GWAS validation dataset. We are motivated to develop methods for approximating AUC of PRSs based on the summary level data of the validation dataset, which will greatly facilitate the development of PRS models for complex diseases. RESULTS:We develop statistical methods and an R package SummaryAUC for approximating the AUC and its variance of a PRS when only the summary level data of the validation dataset are available. SummaryAUC can be applied to PRSs with SNPs either genotyped or imputed in the validation dataset. We examined the performance of SummaryAUC using a large-scale GWAS of schizophrenia. SummaryAUC provides accurate approximations to AUCs and their variances. The bias of AUC is typically