학술논문

Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics
Document Type
article
Author
Yang, MiPetralia, FrancescaLi, ZhiLi, HongyangMa, WeipingSong, XiaoyuKim, SunkyuLee, HeewonYu, HanLee, BoraBae, SeohuiHeo, EunjiKaczmarczyk, JanStępniak, PiotrWarchoł, MichałYu, ThomasCalinawan, Anna PBoutros, Paul CPayne, Samuel HReva, BorisConsortium, NCI-CPTAC-DREAMAderinwale, TundeAfyounian, EbrahimAgrawal, PiyushAli, MehreenAmadoz, AliciaAzuaje, FranciscoBachman, JohnBhalla, SherryCarbonell-Caballero, JoséChakraborty, PriyankaChaudhary, KumardeepChoi, YonghwaChoi, YoonjungÇubuk, CankutDhanda, Sandeep KumarDopazo, JoaquínElo, Laura LFóthi, ÁbelGevaert, OlivierGranberg, KirsiGreiner, RussellHidalgo, Marta RJayaswal, VivekJeon, HwisangJeon, MinjiKalmady, Sunil VKambara, YasuhiroKang, JaewooKang, KeunsooKaoma, TonyKaur, HarpreetKazan, HilalKesar, DevishiKesseli, JuhaKim, DaehanKim, KeonwooKim, Sang-YoonKumar, SajalLiu, YunpengLuethy, RolandMahajan, SwapnilMahmoudian, MehradMuller, ArnaudNazarov, Petr VNguyen, HienNykter, MattiOkuda, ShujiroPark, SungsooRaghava, Gajendra Pal SinghRajapakse, Jagath CRantapero, TommiRyu, HobinSalavert, FranciscoSaraei, SohrabSharma, RubySiitonen, AriSokolov, ArtemSubramanian, KartikSuni, VeronikaSuomi, TomiTranchevent, Léon-CharlesUsmani, Salman SadullahVälikangas, TommiVega, RobertoZhong, HuaBoja, EmilyRodriguez, HenryStolovitzky, GustavoGuan, YuanfangWang, PeiFenyö, DavidSaez-Rodriguez, Julio
Source
Cell Systems. 11(2)
Subject
Biological Sciences
Bioinformatics and Computational Biology
Biotechnology
Cancer
Genetics
Human Genome
Aetiology
2.1 Biological and endogenous factors
Crowdsourcing
Female
Genomics
Humans
Machine Learning
Male
Neoplasms
Phosphoproteins
Proteins
Proteomics
Transcriptome
NCI-CPTAC-DREAM Consortium
cancer
crowdsourcing
genomics
machine learning
protein regulation
proteogenomics
proteomics
Biochemistry and Cell Biology
Biochemistry and cell biology
Language
Abstract
Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information.