학술논문

TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations
Document Type
Working Paper
Source
Subject
Quantitative Biology - Quantitative Methods
Quantitative Biology - Genomics
Language
Abstract
Summary: TreeGrafter is a new software tool for annotating protein sequences using annotated phylogenetic trees. Cur-rently, the tool provides annotations to Gene Ontology terms, and PANTHER protein class, family and subfamily. The ap-proach is generalizable to any annotations that have been made to internal nodes of a reference phylogenetic tree. Tree-Grafter takes each input query protein sequence, finds the best matching homologous family in a library of pre-calculated, pre-annotated gene trees, and then grafts it to the best location in the tree. It then annotates the sequence by propagating anno-tations from its ancestral nodes in the reference tree. We show that TreeGrafter outperforms subfamily HMM scoring for cor-rectly assigning subfamily membership, and that it produces highly specific annotations of GO terms based on annotated reference phylogenetic trees. This method will be further inte-grated into InterProScan, enabling an even broader user com-munity. Availability: TreeGrafter is freely available on the web at https://github.com/haimingt/TreeGrafting.