학술논문

MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters
Document Type
article
Author
Terlouw, Barbara RBlin, KaiNavarro-Muñoz, Jorge CAvalon, Nicole EChevrette, Marc GEgbert, SusanLee, SanghoonMeijer, DavidRecchia, Michael JJReitz, Zachary Lvan Santen, Jeffrey ASelem-Mojica, NellyTørring, ThomasZaroubi, LianaAlanjary, MohammadAleti, GajenderAguilar, CésarAl-Salihi, Suhad AAAugustijn, Hannah EAvelar-Rivas, J AbrahamAvitia-Domínguez, Luis ABarona-Gómez, FranciscoBernaldo-Agüero, JordanBielinski, Vincent ABiermann, FriederikeBooth, Thomas JBravo, Victor J CarrionCastelo-Branco, RaquelChagas, Fernanda OCruz-Morales, PabloDu, ChaoDuncan, Katherine RGavriilidou, AthinaGayrard, DamienGutiérrez-García, KarinaHaslinger, KristinaHelfrich, Eric JNvan der Hooft, Justin JJJati, Afif PKalkreuter, EdwardKalyvas, NikolaosBin Kang, KyoKautsar, SatriaKim, WonyongKunjapur, Aditya MLi, Yong-XinLin, Geng-MinLoureiro, CatarinaLouwen, Joris JRLouwen, Nico LLLund, GeorgeParra, JonathanPhilmus, BenjaminPourmohsenin, BitaPronk, Lotte JURego, AdrianaRex, Devasahayam Arokia BalayaRobinson, SerinaRosas-Becerra, L RodrigoRoxborough, Eve TSchorn, Michelle AScobie, Darren JSingh, Kumar SaurabhSokolova, NikaTang, XiaoyuUdwary, DanielVigneshwari, ArunaVind, KristiinaVromans, Sophie PJMWaschulin, ValentinWilliams, Sam EWinter, Jaclyn MWitte, Thomas EXie, HualiYang, DongYu, JingweiZdouc, MitjaZhong, ZhengCollemare, JérômeLinington, Roger GWeber, TilmannMedema, Marnix H
Source
Nucleic Acids Research. 51(D1)
Subject
Biological Sciences
Bioinformatics and Computational Biology
Genetics
Biotechnology
Generic health relevance
Genomics
Genome
Multigene Family
Biosynthetic Pathways
Environmental Sciences
Information and Computing Sciences
Developmental Biology
Biological sciences
Chemical sciences
Environmental sciences
Language
Abstract
With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.