학술논문

Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest

Document Type

article

Author

Georg Hahn; Sanghun Lee; Dmitry Prokopenko; Jonathan Abraham; Tanya Novak; Julian Hecker; Michael Cho; Surender Khurana; Lindsey R. Baden; Adrienne G. Randolph; Scott T. Weiss; Christoph Lange

Source

BMC Bioinformatics, Vol 23, Iss 1, Pp 1-18 (2022)

Subject

SARS-CoV-2
Nucleotide sequences
Outlier detection
Variants of interest
Machine learning
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5

Language

English

ISSN

1471-2105

Abstract

Abstract As of June 2022, the GISAID database contains more than 11 million SARS-CoV-2 genomes, including several thousand nucleotide sequences for the most common variants such as delta or omicron. These SARS-CoV-2 strains have been collected from patients around the world since the beginning of the pandemic. We start by assessing the similarity of all pairs of nucleotide sequences using the Jaccard index and principal component analysis. As shown previously in the literature, an unsupervised cluster analysis applied to the SARS-CoV-2 genomes results in clusters of sequences according to certain characteristics such as their strain or their clade. Importantly, we observe that nucleotide sequences of common variants are often outliers in clusters of sequences stemming from variants identified earlier on during the pandemic. Motivated by this finding, we are interested in applying outlier detection to nucleotide sequences. We demonstrate that nucleotide sequences of common variants (such as alpha, delta, or omicron) can be identified solely based on a statistical outlier criterion. We argue that outlier detection might be a useful surveillance tool to identify emerging variants in real time as the pandemic progresses.

Online Access

EBSCOHost PDF Full Text (Gale Academic Onefile) Full Text (ProQuest Central) Open Access (DOAJ) Open Access (BioMed Central) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송