학술논문
GenPipes: an open-source framework for distributed and scalable genomic analyses.
Document Type
Article
Author
Bourgey, Mathieu; Dali, Rola; Eveleigh, Robert; Chen, Kuang Chung; Letourneau, Louis; Fillon, Joel; Michaud, Marc; Caron, Maxime; Sandoval, Johanna; Lefebvre, Francois; Leveque, Gary; Mercier, Eloi; Bujold, David; Marquis, Pascale; Van, Patrick Tran; Anderson de Lima Morais, David; Tremblay, Julien; Shao, Xiaojian; Henrion, Edouard; Gonzalez, Emmanuel
Source
Subject
*PYTHON programming language
*BIOINFORMATICS software
*OPEN source software
*WORKFLOW management systems
*COMPUTER workstation clusters
*NUCLEOTIDE sequence
*ELECTRONIC data processing
*
*
*
*
*
*
Language
ISSN
2047-217X
Abstract
Background With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing. Findings Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated and scalable pipelines for various genomics applications, including RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. The software is available under a GPLv3 open source license and is continuously updated to follow recent advances in genomics and bioinformatics. The framework has already been configured on several servers, and a Docker image is also available to facilitate additional installations. Conclusions GenPipes offers genomics researchers a simple method to analyze different types of data, customizable to their needs and resources, as well as the flexibility to create their own workflows. [ABSTRACT FROM AUTHOR]