학술논문

Global biogeography of N2-fixing microbes: nifH amplicon database and analytics workflow.
Document Type
Article
Source
Earth System Science Data Discussions. 6/12/2024, p1-39. 39p.
Subject
*DATABASES
*OCEAN temperature
*BIOGEOGRAPHY
*WORKFLOW
*MULTIENZYME complexes
*KNOWLEDGE gap theory
Language
ISSN
1866-3591
Abstract
Marine nitrogen (N) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine dinitrogen (N2)-fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR)-amplification of a portion of the nifH gene, which encodes a structural component of the N2-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process nifH amplicon data, however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph biogeography, diversity, and their potential contributions to the marine N cycle. To address these knowledge gaps a bioinformatic workflow was designed that standardizes the processing of nifH amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious nifH sequences and annotate the subsequent quality-filtered nifH ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available nifH amplicon HTS datasets from marine studies, and to generate a comprehensive nifH ASV database containing 7909 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and nifH ASV database provide a robust framework for studying marine N2 fixation and diazotrophic diversity captured by nifH amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, in GitHub (https://github.com/jdmagasin/nifH-ASV-workflow ; Morando et al., 2024) and Figshare (https://doi.org/10.6084/m9.figshare.23795943.v1 ; Morando et al., 2024). [ABSTRACT FROM AUTHOR]