학술논문

DiMA: Sequence Diversity Dynamics Analyser for Viruses
Document Type
Working Paper
Source
Subject
Quantitative Biology - Genomics
Quantitative Biology - Quantitative Methods
Language
Abstract
Sequence diversity is one of the major challenges in the design of diagnostic, prophylactic and therapeutic interventions against viruses. Herein, we present DiMA, a tool designed to facilitate the dissection of sequence diversity dynamics for viruses. As a base, DiMA provides a quantitative overview of sequence diversity by use of Shannon's entropy, applied via a user-defined k-mer sliding window to an input alignment file. Distinctively, the key feature is that DiMA interrogates diversity dynamics by dissecting each k-mer position to various diversity motifs, defined based on the incidence of distinct sequences. At a given position, an index is a predominant sequence, while all the others are (total) variants to the index, sub-classified into the major (most common) variant, minor variants (occurring more than once and of frequency lower than the major), and the unique (singleton) variants. Moreover, DiMA allows for metadata enrichment of the motifs. DiMA is big data ready and provides an interactive output, depicting multiple facets of sequence diversity, with download options. It enables comparative genome/proteome diversity dynamics analyses, within and between sequences of viral species. The web server is publicly available at https://dima.bezmialem.edu.tr.
Comment: 17 pages, 2 figures, 48 references