학술논문

Machine learning multi-omics analysis reveals cancer driver dysregulation in pan-cancer cell lines compared to primary tumors
Document Type
article
Source
Communications Biology. 5(1)
Subject
Cancer
Digestive Diseases
Rare Diseases
Biotechnology
Human Genome
Genetics
2.1 Biological and endogenous factors
Aetiology
Good Health and Well Being
Humans
Proteomics
Multiomics
Neoplasms
Machine Learning
Cell Line
Language
Abstract
Cancer cell lines have been widely used for decades to study biological processes driving cancer development, and to identify biomarkers of response to therapeutic agents. Advances in genomic sequencing have made possible large-scale genomic characterizations of collections of cancer cell lines and primary tumors, such as the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA). These studies allow for the first time a comprehensive evaluation of the comparability of cancer cell lines and primary tumors on the genomic and proteomic level. Here we employ bulk mRNA and micro-RNA sequencing data from thousands of samples in CCLE and TCGA, and proteomic data from partner studies in the MD Anderson Cell Line Project (MCLP) and The Cancer Proteome Atlas (TCPA), to characterize the extent to which cancer cell lines recapitulate tumors. We identify dysregulation of a long non-coding RNA and microRNA regulatory network in cancer cell lines, associated with differential expression between cell lines and primary tumors in four key cancer driver pathways: KRAS signaling, NFKB signaling, IL2/STAT5 signaling and TP53 signaling. Our results emphasize the necessity for careful interpretation of cancer cell line experiments, particularly with respect to therapeutic treatments targeting these important cancer pathways.