학술논문

Removing unwanted variation from large-scale RNA sequencing data with PRPS

Document Type

Original Paper

Author

Molania, Ramyar; Foroutan, Momeneh; Gagnon-Bartsch, Johann A.; Gandolfo, Luke C.; Jain, Aryan; Sinha, Abhishek; Olshansky, Gavriel; Dobrovic, Alexander; Papenfuss, Anthony T.; Speed, Terence P.

Source

Nature Biotechnology: The Science and Business of Biotechnology. 41(1):82-95

Subject

Language

English

ISSN

1087-0156
1546-1696

Abstract

Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
Cancer RNA-seq data are normalized with respect to library size, tumor purity and batch effects.

Online Access

Full Text (Nature Journals) Full Text (ProQuest Central) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송