학술논문

A phylogenetic approach for weighting genetic sequences

Document Type

article

Author

De Maio, Nicola; Alekseyenko, Alexander V; Coleman-Smith, William J; Pardi, Fabio; Suchard, Marc A; Tamuri, Asif U; Truszkowski, Jakub; Goldman, Nick

Source

BMC Bioinformatics. 22(1)

Subject

Biological Sciences
Bioinformatics and Computational Biology
Evolutionary Biology
Genetics
Generic health relevance
Algorithms
Computational Biology
Phylogeny
Sequence Alignment
Phylogenetics
Sequence weights
Alignment
Protein profile
Conservation scores
Mathematical Sciences
Information and Computing Sciences
Bioinformatics
Biological sciences
Information and computing sciences
Mathematical sciences

Language

Abstract

BackgroundMany important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are 'novel' compared to the others in the same dataset, and low weights to sequences that are over-represented.ResultsWe formalise this principle by rigorously defining the evolutionary 'novelty' of a sequence within an alignment. This results in new sequence weights that we call 'phylogenetic novelty scores'. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column-important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes.ConclusionsOur phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.

Online Access

Open Access (eScholarship) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송