From BugSigDB
Revision as of 18:26, 1 December 2023 by Lwaldron (talk | contribs) (update Chloe's credentials)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Citing BugSigDB[edit]

If you use BugSigDB in published research, please cite:

Geistlinger L, Mirzayi C, Zohra F, Azhar R, Elsafoury S, Grieve C, Wokaty J, Gamboa-Tuz SD, Sengupta P, Hecht I, Ravikrishnan A, Gonçalves RS, Franzosa E, Raman K, Carey V, Dowd JB, Jones HE, Davis S, Segata N, Huttenhower C, Waldron L: BugSigDB captures patterns of differential abundance across a broad range of host-associated microbial signatures. Nat. Biotechnol. 2023. (download citation as .ris file)

For a short summary of the work, see the associated Research Briefing BugSigDB — a database for identifying unusual abundance patterns in human microbiome studies or Press Release CUNY SPH researchers unveil comprehensive database of published microbial signatures.

What is BugSigDB?[edit] is an editable Semantic Mediawiki that provides manually-curated, expert-reviewed, syntax-standardized reporting of the published literature on microbial signatures. Curators standardize taxa to the NCBI taxonomy and standardize key information about the studies, experiments, and analytical methods that generated these signatures using data entry forms, controlled vocabulary, and ontologies. BugSigDB standardizes published microbiome literature including:

  • geography, health outcomes, host body sites, and experimental, epidemiological, and statistical methods using controlled vocabulary,
  • results on microbial diversity,
  • microbial signatures standardized to the NCBI taxonomy, and
  • identification of published signatures where a microbe has been reported.

Video tutorials[edit]

To get started, please see our collection of video tutorials, which includes:

  • a summary of what BugSigDB is and why it might be useful for you,
  • an overview of BugSigDB functionality, including how to use the wiki, how to search for studies, and how to export BugSigDB data for external analysis,
  • a step-by-step walk-through of how to add a new study through the curation interface of BugSigDB, and
  • a demo of how to work with BugSigDB data in R using the bugsigdbr R/Bioconductor package.

Why and how would I contribute?[edit]

You too can enter studies and signatures into, and doing so provides benefits:

  1. the ability to compare to previously published signatures either directly on, through the bugsigdbr R/Bioconductor package, or through bulk text export and analysis in the software of your choice. Shortly after you enter or edit a signature, it is exportable and analyzable: see for several example analyses.
  2. standardized reporting of study results with a citable URL. Posting your experimental results here is straightforward, vastly more usable than a supplemental table, and allows you to compare your signatures to the rest of
  3. bulk copy-paste signature entry if you use comma-separated NCBI taxonomy IDs.
  4. Automatic hyperlinking to the NCBI taxonomy and taxonomy-aware organization and analysis.

It is our goal to incorporate all published host-associated microbiome differential abundance signatures. To start contributing, see the information for curators and create a BugSigDB account.

The BugSigDB Team[edit]

See testimonials of BugSigDB curators to see why new scientists might want to get involved!

Principal Investigators[edit]

CUNY Graduate School of Public Health and Health Policy[edit]

  • Dr. Levi Waldron
  • Dr. Heidi Jones

Harvard Medical School[edit]

  • Dr. Ludwig Geistlinger

Colorado University Medical Center[edit]

  • Dr. Sean Davis

Harvard TH Chan School of Public Health[edit]

  • Dr. Curtis Huttenhower

University of Trento[edit]

  • Dr. Nicola Segata


The following current and past CUNY SPH staff have contributed extensively through reviewing the work of curators, creating data validation rules, and curation.


CUNY SPH[edit]

The following curators have contributed significantly content as part of their master's program fieldwork.

Fall 2022[edit]

  • Mary Bearkland

Summer 2022[edit]

  • Ifeanyi Kalu
  • Jessica Shudy
  • Sharmila Chunduri
  • Umadevi Yokeeswaran
  • Jacquelyn Shevin

Winter 2022[edit]

  • Joyessa Dey


  • Frans Cuevas
  • Martha Martin
  • Clare Grieve
  • Samara Khan
  • Tangirul Islam
  • Kweku Amoo
  • Philippe Michael Lutete
  • Mst Afroza Parvin
  • Victoria Goulbourne
  • Zyaijah Bailey
  • Cynthia Anderson
  • William Lam
  • Nadine Ulysse
  • Lucille Mellor
  • Utsav Patel
  • Marianthi Thomatos
  • Phyu Han
  • Lora Kasselman
  • Fatima Zohra
  • Shaima El Safoury
  • Yaseen Javaid


  • Lana Park
  • Madhubani Dey
  • Valentina Pineda
  • Yu Wang
  • Manuela Hoyos
  • My Nguyen
  • Titas Sil

Wiki Development[edit]

Development of the Semantic Mediawiki site was done by WikiWorks MediaWiki Consulting and led by developer Ike Hecht.

Get in touch[edit]

Report issues and request functionality or missing vocabulary at You can also talk to site administrators at

We can train student interns and other volunteers, and guide focused projects on an area of your interest in the literature. Contact Levi Waldron to get involved.


BugSigDB: A Comprehensive Database of Published Microbial Signatures is funded primarily by the National Cancer Institute of the National Institutes of Health, 5R01CA230551 "Exploiting public metagenomic data to uncover cancer-microbiome relationships" to Levi Waldron.


Data licensing[edit]

BugSigDB data can be downloaded in bulk directly, from hourly snapshots on GitHub, semi-annual releases on Zenodo, or with additional functionality via the bugsigdbr R/Bioconductor client.

These data are made available under the Open Data Commons Attribution License: You are free:

  • To share: To copy, distribute and use the database.
  • To create: To produce works from the database.
  • To adapt: To modify, transform and build upon the database.

As long as you:

  • Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the license. For any use or redistribution of the database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database.


This is not a license. It is simply a handy reference for understanding the ODC-BY 1.0 — it is a human-readable expression of some of its key terms. This document has no legal value, and its contents do not appear in the actual license. Read the full ODC-BY 1.0 license text for the exact terms that apply.

Software licensing[edit]

The software powering this site and downstream analyses of the data are licensed under various FSF-approved Open Source licenses, please see individual software packages for information.