학술논문

Gene Tree Construction and Correction Using SuperTree and Reconciliation
Document Type
Periodical
Source
IEEE/ACM Transactions on Computational Biology and Bioinformatics IEEE/ACM Trans. Comput. Biol. and Bioinf. Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 15(5):1560-1570 Jan, 2018
Subject
Bioengineering
Computing and Processing
Vegetation
Phylogeny
Heuristic algorithms
Labeling
Topology
History
Electronic mail
Algorithm design and analysis
bioinformatics
phylogeny
reconciliation
supertree
Language
ISSN
1545-5963
1557-9964
2374-0043
Abstract
The supertree problem asking for a tree displaying a set of consistent input trees has been largely considered for the reconstruction of species trees. Here, we rather explore this framework for the sake of reconstructing a gene tree from a set of input gene trees on partial data. In this perspective, the phylogenetic tree for the species containing the genes of interest can be used to choose among the many possible compatible “supergenetrees”, the most natural criteria being to minimize a reconciliation cost. We develop a variety of algorithmic solutions for the construction and correction of gene trees using the supertree framework. A dynamic programming supertree algorithm for constructing or correcting gene trees, exponential in the number of input trees, is first developed for the less constrained version of the problem. It is then adapted to gene trees with nodes labeled as duplication or speciation, the additional constraint being to preserve the orthology and paralogy relations between genes. Then, a quadratic time algorithm is developed for efficiently correcting an initial gene tree while preserving a set of “trusted” subtrees, as well as the relative phylogenetic distance between them, in both cases of labeled or unlabeled input trees. By applying these algorithms to the set of Ensembl gene trees, we show that this new correction framework is particularly useful to correct weakly-supported duplication nodes. The C++ source code for the algorithms and simulations described in the paper are available at https://github.com/UdeM-LBIT/SuGeT.