학술논문

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
Document Type
article
Author
Zhou, NaihuiJiang, YuxiangBergquist, Timothy RLee, Alexandra JKacsoh, Balint ZCrocker, Alex WLewis, Kimberley AGeorghiou, GeorgeNguyen, Huy NHamid, Md NafizDavis, LarryDogan, TuncaAtalay, VolkanRifaioglu, Ahmet SDalkıran, AlperenCetin Atalay, RengulZhang, ChengxinHurto, Rebecca LFreddolino, Peter LZhang, YangBhat, PrajwalSupek, FranFernández, José MGemovic, BranislavaPerovic, Vladimir RDavidović, Radoslav SSumonja, NevenVeljkovic, NevenaAsgari, EhsaneddinMofrad, Mohammad RKProfiti, GiuseppeSavojardo, CastrenseMartelli, Pier LuigiCasadio, RitaBoecker, FlorianSchoof, HeikoKahanda, IndikaThurlby, NatalieMcHardy, Alice CRenaux, AlexandreSaidi, RabieGough, JulianFreitas, Alex AAntczak, MagdalenaFabris, FabioWass, Mark NHou, JieCheng, JianlinWang, ZhengRomero, Alfonso EPaccanaro, AlbertoYang, HaixuanGoldberg, TatyanaZhao, ChenguangHolm, LiisaTörönen, PetriMedlar, Alan JZosa, ElaineBorukhov, ItamarNovikov, IlyaWilkins, AngelaLichtarge, OlivierChi, Po-HanTseng, Wei-ChengLinial, MichalRose, Peter WDessimoz, ChristopheVidulin, VedranaDzeroski, SasoSillitoe, IanDas, SayoniLees, Jonathan GillJones, David TWan, CenCozzetto, DomenicoFa, RuiTorres, MateoWarwick Vesztrocy, AlexRodriguez, Jose ManuelTress, Michael LFrasca, MarcoNotaro, MarcoGrossi, GiulianoPetrini, AlessandroRe, MatteoValentini, GiorgioMesiti, MarcoRoche, Daniel BReeb, JonasRitchie, David WAridhi, SabeurAlborzi, Seyed ZiaeddinDevignes, Marie-DominiqueKoo, Da Chen EmilyBonneau, RichardGligorijević, VladimirBarot, MeetFang, HaiToppo, StefanoLavezzo, Enrico
Source
Genome Biology. 20(1)
Subject
Human Genome
Networking and Information Technology R&D (NITRD)
Genetics
Generic health relevance
Animals
Biofilms
Candida albicans
Drosophila melanogaster
Genome
Bacterial
Genome
Fungal
Humans
Locomotion
Memory
Long-Term
Molecular Sequence Annotation
Pseudomonas aeruginosa
Protein function prediction
Long-term memory
Biofilm
Critical assessment
Community challenge
Environmental Sciences
Biological Sciences
Information and Computing Sciences
Bioinformatics
Language
Abstract
BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.