학술논문

An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Document Type
Working Paper
Author
Jiang, YuxiangOron, Tal RonnenClark, Wyatt TBankapur, Asma RD'Andrea, DanielLepore, RosalbaFunk, Christopher SKahanda, IndikaVerspoor, Karin MBen-Hur, AsaKoo, EmilyPenfold-Brown, DuncanShasha, DennisYoungs, NoahBonneau, RichardLin, AlexandraSahraeian, Sayed MEMartelli, Pier LuigiProfiti, GiuseppeCasadio, RitaCao, RenzhiZhong, ZhaolongCheng, JianlinAltenhoff, AdrianSkunca, NivesDessimoz, ChristopheDogan, TuncaHakala, KaiKaewphan, SuwisaMehryary, FarrokhSalakoski, TapioGinter, FilipFang, HaiSmithers, BenOates, MattGough, JulianTörönen, PetriKoskinen, PatrikHolm, LiisaChen, Ching-TaiHsu, Wen-LianBryson, KevinCozzetto, DomenicoMinneci, FedericoJones, David TChapman, SamuelC., Dukka B K.Khan, Ishita KKihara, DaisukeOfer, DanRappoport, NadavStern, AmosCibrian-Uhalte, ElenaDenny, PaulFoulger, Rebecca EHieta, ReijaLegge, DuncanLovering, Ruth CMagrane, MicheleMelidoni, Anna NMutowo-Meullenet, PrudencePichler, KlemensShypitsyna, AleksandraLi, BiaoZakeri, PooyaElShal, SarahTranchevent, Léon-CharlesDas, SayoniDawson, Natalie LLee, DavidLees, Jonathan GSillitoe, IanBhat, PrajwalNepusz, TamásRomero, Alfonso ESasidharan, RajkumarYang, HaixuanPaccanaro, AlbertoGillis, JesseSedeño-Cortés, Adriana EPavlidis, PaulFeng, ShouCejuela, Juan MGoldberg, TatyanaHamp, TobiasRichter, LotharSalamov, AsafGabaldon, ToniMarcet-Houben, MarinaSupek, FranGong, QingtianNing, WeiZhou, YuanpengTian, WeidongFalda, MarcoFontana, PaoloLavezzo, EnricoToppo, StefanoFerrari, CarloGiollo, ManuelPiovesan, DamianoTosatto, Silviodel Pozo, AngelaFernández, José MMaietta, PaoloValencia, AlfonsoTress, Michael LBenso, AlfredoDi Carlo, StefanoPolitano, GianfrancoSavino, AlessandroRehman, Hafeez UrRe, MatteoMesiti, MarcoValentini, GiorgioBargsten, Joachim Wvan Dijk, Aalt DJGemovic, BranislavaGlisic, SanjaPerovic, VladmirVeljkovic, VeljkoVeljkovic, NevenaAlmeida-e-Silva, Danillo CVencio, Ricardo ZNSharan, MalvikaVogel, JörgKansakar, LakeshZhang, ShanshanVucetic, SlobodanWang, ZhengSternberg, Michael JEWass, Mark NHuntley, Rachael PMartin, Maria JO'Donovan, ClaireRobinson, Peter NMoreau, YvesTramontano, AnnaBabbitt, Patricia CBrenner, Steven ELinial, MichalOrengo, Christine ARost, BurkhardGreene, Casey SMooney, Sean DFriedberg, IddoRadivojac, Predrag
Source
Subject
Quantitative Biology - Quantitative Methods
Language
Abstract
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of function to biological macromolecules, especially proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, accurately assessing methods for protein function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for function prediction.
Comment: Submitted to Genome Biology