학술논문

Cloudflow - A framework for MapReduce pipeline development in Biomedical Research
Document Type
Conference
Source
2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015 38th International Convention on. :172-177 May, 2015
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Photonics and Electrooptics
Power, Energy and Industry Applications
Pipelines
Information filters
Genomics
Pipeline processing
Bioinformatics
Language
Abstract
The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we present Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone. The framework is open source and free available at https://github.com/genepi/cloudflow.