학술논문

RNA-seq data science: From raw data to effective interpretation
Document Type
Working Paper
Source
Subject
Quantitative Biology - Genomics
Language
Abstract
RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It generates enormous amounts of transcriptomic data in the form of nucleotide sequences, known as reads. RNA-seq analysis enables the probing of genes and corresponding transcripts which is essential for answering important biological questions, such as detecting novel exons, transcripts, gene expressions, and studying alternative splicing structure. However, obtaining meaningful biological signals from raw data using computational methods is challenging due to the limitations of modern sequencing technologies. The need to leverage these technological challenges have pushed the rapid development of many novel computational tools which have evolved and diversified in accordance with technological advancements, leading to the current myriad population of RNA-seq tools. Our review provides a systemic overview of RNA-seq technology and 235 available RNA-seq tools across various domains published from 2008 to 2020, discussing the interdisciplinary nature of bioinformatics involved in RNA sequencing, analysis, and software development.