학술논문

Improving spliced alignment for identification of ortholog groups and multiple CDS alignment

Document Type

Working Paper

Author

Aguilar, Jean-David; Jammali, Safa; Kuitche, Esaie; Ouangraoua, Aïda

Source

Subject

Computer Science - Data Structures and Algorithms
Quantitative Biology - Genomics

Language

Abstract

The Spliced Alignment Problem (SAP) that consists in finding an optimal semi-global alignment of a spliced RNA sequence on an unspliced genomic sequence has been largely considered for the prediction and the annotation of gene structures in genomes. Here, we re-visit it for the purpose of identifying CDS ortholog groups within a set of CDS from homologous genes and for computing multiple CDS alignments. We introduce a new constrained version of the spliced alignment problem together with an algorithm that exploits full information on the exon-intron structure of the input RNA and gene sequences in order to compute high-coverage accurate alignments. We show how pairwise spliced alignments between the CDS and the gene sequences of a gene family can be directly used in order to clusterize the set of CDS of the gene family into a set of ortholog groups. We also introduce an extension of the spliced alignment problem called Multiple Spliced Alignment Problem (MSAP) that consists in aligning simultaneously several RNA sequences on several genes from the same gene family. We develop a heuristic algorithmic solution for the problem. We show how to exploit multiple spliced alignments for the clustering of homologous CDS into ortholog and close paralog groups, and for the construction of multiple CDS alignments. An implementation of the method in Python is available on demande to SFA@USherbrooke.ca. Keywords: Spliced alignment, CDS ortholog groups, Multiple CDS alignment, Gene structure, Gene family.
Comment: 22 pages, 7 figures

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송