Several studies have compared metagenome inference performance in different human body sites; however, none specifically reported on the vaginal microbiome. Findings from other body sites cannot easily be generalized to the vaginal microbiome due to unique features of vaginal microbial ecology, and investigators seeking to use metagenome inference in vaginal microbiome research are "flying blind" with respect to potential bias these methods may introduce into analyses. We compared the performance of PICRUSt2 and Tax4Fun2 using paired 16S rRNA gene amplicon sequencing and whole-metagenome sequencing data from vaginal samples from 72 pregnant individuals enrolled in the Pregnancy, Infection, and Nutrition (PIN) cohort. Participants were selected from those with known birth outcomes and adequate 16S rRNA gene amplicon sequencing data in a case-control design. Cases experienced early preterm birth (<32 weeks of gestation), and controls experienced term birth (37 to 41 weeks of gestation). PICRUSt2 and Tax4Fun2 performed modestly overall (median Spearman correlation coefficients between observed and predicted KEGG ortholog [KO] relative abundances of 0.20 and 0.22, respectively). Both methods performed best among Lactobacillus crispatus-dominated vaginal microbiotas (median Spearman correlation coefficients of 0.24 and 0.25, respectively) and worst among Lactobacillus iners-dominated microbiotas (median Spearman correlation coefficients of 0.06 and 0.11, respectively). The same pattern was observed when evaluating correlations between univariable hypothesis test P values generated with observed and predicted metagenome data. Differential metagenome inference performance across vaginal microbiota community types can be considered differential measurement error, which often causes differential misclassification. As such, metagenome inference will introduce hard-to-predict bias (toward or away from the null) in vaginal microbiome research. IMPORTANCE Compared to taxonomic composition, the functional potential within a bacterial community is more relevant to establishing mechanistic understandings and causal relationships between the microbiome and health outcomes. Metagenome inference attempts to bridge the gap between 16S rRNA gene amplicon sequencing and whole-metagenome sequencing by predicting a microbiome's gene content based on its taxonomic composition and annotated genome sequences of its members. Metagenome inference methods have been evaluated primarily among gut samples, where they appear to perform fairly well. Here, we show that metagenome inference performance is markedly worse for the vaginal microbiome and that performance varies across common vaginal microbiome community types. Because these community types are associated with sexual and reproductive outcomes, differential metagenome inference performance will bias vaginal microbiome studies, obscuring relationships of interest. Results from such studies should be interpreted with substantial caution and the understanding that they may over- or underestimate associations with metagenome content.
Experiment 1
Reviewed Marked as Reviewed by ChiomaBlessing on 2024-1-30
Subjects
- Location of subjects
- United States of America
- Host species Species from which microbiome was sampled. Contact us to have more species added.
- Homo sapiens
- Body site Anatomical site where microbial samples were extracted from according to the Uber Anatomy Ontology
- Vagina Vagina,vagina,Distal oviductal region,Distal portion of oviduct,Vaginae
- Condition The experimental condition / phenotype studied according to the Experimental Factor Ontology
- Premature birth Birth, Premature,Birth, Preterm,Births, Premature,Births, Preterm,Premature Births,Preterm Birth,Preterm Births,Premature birth,premature birth
- Group 0 name Corresponds to the control (unexposed) group for case-control studies
- term birth(control)
- Group 1 name Corresponds to the case (exposed) group for case-control studies
- preterm birth (PTB)
- Group 1 definition Diagnostic criteria applied to define the specific condition / phenotype represented in the case (exposed) group
- Cases were participants who experienced early preterm birth at <32 weeks of gestation.
- Group 0 sample size Number of subjects in the control (unexposed) group
- 37
- Group 1 sample size Number of subjects in the case (exposed) group
- 35
- Antibiotics exclusion Number of days without antibiotics usage (if applicable) and other antibiotics-related criteria used to exclude participants (if any)
- None.
Lab analysis
- Sequencing type
- 16S
- 16S variable region One or more hypervariable region(s) of the bacterial 16S gene
- V1-V3
- Sequencing platform Manufacturer and experimental platform used for quantifying microbial abundance
- Illumina
Statistical Analysis
- Data transformation Data transformation applied to microbial abundance measurements prior to differential abundance testing (if any).
- relative abundances
- Statistical test
- Mann-Whitney (Wilcoxon)