The study also identified technical factors complicating microbiome research. Methods like DNA extraction and primer selection can introduce biases into microbiome studies. Existing reference databases, like the SILVA ribosomal RNA gene database, are primarily based on Western microbiomes, potentially underestimating diversity from regions like Sub-Saharan Africa and South-East Asia.
To address these gaps, the researchers analyzed publicly available sequencing data from the Sequence Read Archive (SRA), focusing on 245,627 microbiome samples. After filtering and ensuring consistency across sequencing platforms, the final dataset included 168,464 samples from 68 countries. These samples collectively contributed 5.57 terabytes of sequencing data processed through a uniform pipeline.
The researchers used the Divisive Amplicon Denoising Algorithm 2 (DADA2) to process the data and made taxonomic assignments using the SILVA reference database. The final dataset was carefully curated, removing samples with insufficient reads or rare taxa, resulting in a high-quality subset of 150,721 samples. The team also inferred metadata such as the country of origin, DNA extraction methods, and amplicon choice, which helped in quantifying global microbiome variation.
The study revealed stark differences in microbiome composition across regions. Europe and North America dominated the dataset, contributing 60.5% of the samples, while regions like Central and Southern Asia (3.4%) and Sub-Saharan Africa (3.7%) were underrepresented. Interestingly, Latin America exhibited the highest alpha diversity, followed by regions in Sub-Saharan Africa. Faith’s Phylogenetic Diversity (PD) analysis showed that combining samples from underrepresented regions with those from Europe and North America led to a significant increase in evolutionary diversity, by as much as 68.6%. This suggests that vast unexplored microbial lineages exist, particularly in underrepresented regions.
Taxonomic richness, measured by the Shannon diversity index, also varied widely, with Latin America showing the highest diversity. Rarefaction analysis indicated that many genus-level taxa are still being discovered, especially in regions like Central and Southern Asia and Sub-Saharan Africa.
Technical factors, such as DNA extraction methods and the choice of sequencing amplicons, also influenced microbiome composition. For example, taxa like Enterobacter and Akkermansia varied in abundance depending on the region of the 16S rRNA gene amplified during sequencing. These technical biases contributed to differential microbiome results and suggested that variations in methodology could lead to substantial differences in observed microbiome compositions.
The study concluded that significant microbiome differences exist across geographic regions, with Western-centric databases underrepresenting diversity from regions like Latin America and Sub-Saharan Africa. Researchers found higher levels of Bacteroides in Europe and North America, and elevated levels of Prevotella in Sub-Saharan Africa and Latin America, highlighting how geography influences gut microbiome composition. The study also revealed that primer biases, such as those affecting the detection of methanogenic archaea like *Methanobrevibacter*, further complicate the analysis.
This study’s findings emphasize the need for a more balanced and global approach to microbiome research. The study’s large-scale dataset provides a valuable resource for future microbiome studies and underscores the importance of including underrepresented populations to better understand global health disparities. Through this effort, researchers hope to uncover the full extent of microbiome diversity, which may lead to better personalized treatments and interventions for a wide range of health conditions.