학술논문

Fast computation of the eigensystem of genomic similarity matrices

Document Type

article

Author

Georg Hahn; Sharon M. Lutz; Julian Hecker; Dmitry Prokopenko; Michael H. Cho; Edwin K. Silverman; Scott T. Weiss; Christoph Lange

Source

BMC Bioinformatics, Vol 25, Iss 1, Pp 1-20 (2024)

Subject

Covariance matrix
Fast SVD
Genomic relationship matrix
Jaccard matrix
Principal components
Weighted Jaccard matrix
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5

Language

English

ISSN

1471-2105

Abstract

Abstract The computation of a similarity measure for genomic data is a standard tool in computational genetics. The principal components of such matrices are routinely used to correct for biases due to confounding by population stratification, for instance in linear regressions. However, the calculation of both a similarity matrix and its singular value decomposition (SVD) are computationally intensive. The contribution of this article is threefold. First, we demonstrate that the calculation of three matrices (called the covariance matrix, the weighted Jaccard matrix, and the genomic relationship matrix) can be reformulated in a unified way which allows for the application of a randomized SVD algorithm, which is faster than the traditional computation. The fast SVD algorithm we present is adapted from an existing randomized SVD algorithm and ensures that all computations are carried out in sparse matrix algebra. The algorithm only assumes that row-wise and column-wise subtraction and multiplication of a vector with a sparse matrix is available, an operation that is efficiently implemented in common sparse matrix packages. An exception is the so-called Jaccard matrix, which does not have a structure applicable for the fast SVD algorithm. Second, an approximate Jaccard matrix is introduced to which the fast SVD computation is applicable. Third, we establish guaranteed theoretical bounds on the accuracy (in $$L_2$$ L 2 norm and angle) between the principal components of the Jaccard matrix and the ones of our proposed approximation, thus putting the proposed Jaccard approximation on a solid mathematical foundation, and derive the theoretical runtime of our algorithm. We illustrate that the approximation error is low in practice and empirically verify the theoretical runtime scalings on both simulated data and data of the 1000 Genome Project.

Online Access

EBSCOHost PDF Full Text (Gale Academic Onefile) Full Text (ProQuest Central) Open Access (DOAJ) Open Access (BioMed Central) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송