Curation Policy

From BugSigDB

Microbial Signature Curation Project Policy




  • Microbiome project: is a comprehensive database of microbial signatures which enables a systematic interpretation of microbiome studies in terms of microbial physiology and similarity to previously published results.
  • Search for microbiome literature using NCBI

Add a new study[edit]

  • Add a Study (Mark the page complete + save page)
  • Add a new Experiment

Create an Experiment[edit]

  • Associated with high and low bacterial abundance
  • When one study has more than one contrast, duplicate the experiment.

Location of Subjects[edit]

  • Country from which study subjects were recruited

Host Species[edit]

  • Species from which microbiome was sampled (if applicable)
  • There is a drop down to pick Homo sapiens for human studies, mus musculus for mice studies, rattus norvegicus for rat studies.

Body site[edit]

  • Check environmental ontology through Link label for properly reporting the body sites.
  • If the drop down doesn’t contain the condition, please use the link to find appropriate ontology.


  • Check experimental ontology through Link label for properly reporting the conditions.
  • If the drop down doesn’t contain the condition, please use the link to find appropriate ontology.


  • The contrast between two groups (unexposed vs. exposed)
  • Group 0 name: Corresponds to the control (unexposed) group for case-control and other studies
  • Group 1 name: Corresponds to the case (exposed) group for case-control studies
  • In some cases, there are groups with low vs. high severity of a disease, use group 0 for low severity and group 1 for high severity cases.
  • Don’t Switch the sample size of exposed groups (cases) with unexposed groups (controls).

Exclusion Criteria[edit]

  • Antibiotics
  • Include the time frame given for antibiotic exclusion e.g. 2 months, 3 weeks etc.

Lab Analysis[edit]

  • Add Sequencing type
  • Add Sequencing platform

Sequencing type[edit]

  • Two main types: 16S and WMS (Whole Metagenome Sequencing)
  • 16S variable regions (lower and upper) e.g. V3-V4 (V3 will be lower bound and V4 will be upper bound).
  • If there is only one variable region given e.g V4, use only lower bound.

Sequencing Platform[edit]

  • Manufacturer and experimental platform used for quantifying microbial abundance.
  • Use the drop down options to select from.

Statistical Analysis[edit]

Statistical test[edit]

  • Statistical test or computational tool used for differential abundance testing.
  • Use the drop down options to select from. Leave blank if the study doesn’t specify.

Significance threshold[edit]

  • p-value, q-value, or FDR threshold should be reported numerically.
  • LEfSe doesn’t do multiple testing correction. However, it is possible to take the p-value from LEfSe and adjust them for multiple testing in R.

MHT correction[edit]

  • Multiple Hypothesis testing
  • Select ‘Yes’, ‘No’, or ‘Blank’ (What is the main difference between NO and Blank)

LDA score[edit]

  • Threshold for the linear discriminant analysis (LDA) score for studies using the popular LEfSe tool.
  • Only applies to studies using a LEfSe analysis for their statistical test.
  • Numeric value only

Matched On[edit]

  • Matching typically only occurs in a case control study.
  • Matching occurs when they select controls similar to cases and match them on certain characteristics such as race, age, health status, gender, etc.
  • It will often be phrased along the lines of “Cases were matched to controls on race, age, and gender” or “Age, gender, and race-matched study population”.
  • Ctrl+F for the word “match” is generally an easy way to look for matching.
  • Use the drop down to select the matched on terms. Don’t use abbreviated terms in the field.

Confounders Controlled for[edit]

  • Controlling for confounders can mainly occur through one of two ways:
  1. 1 Stratification. They stratify on a variable (often race, age, gender, medication status) and report differential abundance results for each strata (group) in separate tables/figures/sentences.
  • For instance, they report different results for men and women.
  • Stratification can be done with nearly any type of analysis.
  1. 2 Model adjustment. They include the confounder as a variable in a regression model.
  • This will be stated in the methods and will state that they used a regression model (negative binomial, generalized linear, etc.) and this model was adjusted for a list of variables.
  • That list of variables is the confounders controlled for
  • You cannot adjust for confounders using bivariate tests including T-Tests, Fisher’s Exact Test, Kruskall-Wallis, Mann-Whitney, LEfSe, and chi-square.

Alpha Diversity[edit]

  • Only looking at diversity in Group 1 (exposed group)
  • If there is no alpha diversity test conducted, leave the sections blank.
  • If an alpha diversity test was conducted with statistically significant results, put “increased” in the Group 1 (exposed group) of higher diversity, and “decreased ” for the Group 1 (exposed group) of lower diversity.
  • If there is an alpha diversity test without statistically significant results between the two groups, put “unchanged” for that test.

Add a new Signature[edit]

  • Add a new signature(on the experiment page,click on add a new signature).
  • On the Add a new signature page,You enter the microbiome details.


  • The figure/table number where the signatures are found Abundance in Group 1
  • Whether the abundance has increased or decreased in the Group 1 (exposed group)
  • A separate signature should be created for the increased and decreased group.
  • For example: Signature 1 is for increased abundance in exposed group. Signature 2 is for decreased in exposed group.


  • Curate all taxa no matter of their taxonomic levels.
  • When you start typing in the name of a taxon, autocomplete will show it if if already exists in If autocomplete does not show what you are looking for, search for it in the NCBI taxonomy browser then enter the integer NCBI taxid.
  • Enter only the highest taxonomic rank mentioned on the paper (e.g. for genus level enter Clostridium, for species level enter Staphylococcus aureus or Lactobacillus zeae.
  • You might sometimes have more than one presented comparison for the same contrasting group, so if results are largely agreeing, curate as one. if results are very different, curate separately.
  • If signatures are reported using two different statistical tests and they are overlapping, use the test that reports a larger number of signatures. Curate as two different experiments when results are very different between the tests.

Sometimes papers report an unclassified species, but provide classification up to the genus level. Such unclassified species should be excluded from the curation as opposed to curating the taxon up to the genus level. In general, differential abundance of a certain species does not necessarily imply differential abundance of the corresponding genus. (Note that this equally applies to the case of unclassified higher taxonomic levels such as the case of an unclassified genus where classification is provided up to the family level).

  • If a taxon is reported as something like this Prevotella KQ959344_s, and you cannot find this in the NCBI taxonomy browser, report it exactly the way it’s reported in the paper since it could be referring to a certain strain of Prevotella.


  1. If two different figures/tables differ only in which taxonomic level they provide (e.g. one figure for species, one supplemental table for genera, but these resulted from the same contrast), these can be combined into a single signature.
  2. If taxa are not already found in, autocomplete won't show them. Any time autocomplete doesn’t work, you have to search NCBI taxonomy browser and enter the integer ID if you can find it there. It is helpful to start typing the genus and species, then cut and paste it into the NCBI taxonomy browser if autocomplete didn’t show it. The NCBI taxonomy is too large for autocomplete, so only autocompletes things already in the wiki and instead expects either integer NCBI taxids, or uninterpreted text.
  3. Where you see “Unknown” and a genus level, referring to an unknown species from a known genus, this is often referred to by adding “sp.” to the genus name. For example instead of “Unknown Subdoligranulum” there is “Subdoligranulum sp.” in the taxonomy browser (ID 2053618) and this ID can be entered into the signature.
  4. For species that are only classified to higher levels of the taxonomy than genus (for example only their family or order are known), try adding “bacterium” to the family in your NCBI taxonomy browser search, order etc. E.g. see Clostridiales bacterium (Taxonomy ID: 1898207, unclassified species from the Eubacteriales order). Eubacteriales is a heterotypic synonym for Clostridiales.
  5. Sometimes the genus names were abbreviated in the signature, e.g. “F. nucleatum” instead of “Fusobacterium nucleatum”. This is supposed to be done only after the first mention of the genus/species binary name, but in this signature there was no full name elsewhere, so the easiest way was to go to the NCBI taxonomy and look up the species.
  6. The NCBI taxonomy names sometimes include square brackets, even if they are not part of the published signature. For example [Ruminococcus] torques. At the bottom of this NCBI page it clarifies that “Square brackets ([ ]) around a genus indicates that the name awaits appropriate action by the research community to be transferred to another genus.” This is fine, the NCBI Taxonomy ID will remain and the genus will be automatically corrected in the future just by using the NCBI taxonomy. Note that stores NCBI Taxonomy IDs, even though it instead shows current names to the user.
  7. Sometimes searching for a name in the NCBI taxonomy will take you instead to a synonym, which is fine. For example, try searching for “Bacteroides symbiosus”; it takes you to to the page for [Clostridium] symbiosum, which lists Bacteroides symbiosus as a “homotypic synonym”. The important thing here is that you have the correct Taxonomy ID, which refers both to the current correct name and this homotypic synonym. There are other kinds of synonyms too, Google provides good definitions (usually the first one is from wiktionary).
  8. Genus and higher-level taxonomic names (everything but species) are always capitalized. It can be convenient to start entering these names in uncapitalized, so that you know if it is successfully autocompleted by the introduction of capitalization. This is important because it’s possible to enter a valid name without autocomplete, that is never matched to the NCBI taxonomy in, and is instead treated as plain-text and not a part of the taxonomy.

Bulk signature entry[edit]

You can enter a signature in bulk if it is represented by comma-separated integer NCBI taxonomy IDs. Copy and paste the comma-separated list of IDs into the signature taxid form, then take care the last value is entered correctly (you may have to press space to see the integer autocomplete, or delete a space if one was pasted in).