Suggested Reference DatabasesΒΆ

Dependent on where your sequences originate (e.g. ITS, 16S, LSU), you will need to have an appropriate database with which to classify them.

For Fungi or all Eukaryotes, the UNITE database is preferred. The format of the reference database to use with CONSTAX is one of those under the General fasta format. For the latest release (10.05.2021), training with 32GB of RAM for Fungi only or 40GB for all Eukaryotes should be sufficient.

For Bacteria and Archaea, we recommend the SILVA reference database. The SILVA_XXX_SSURef_tax_silva.fasta.gz file can be gunzip-ped and used.

Note

SILVA taxonomy is not assigned by Linnean ranks (Kingdom, Phylum, etc.), so instead placeholder ranks 1-n are used. Also, the size of the SILVA database means that a server/cluster is required to train the classifier becasue 128GB RAM for the RDP training are required. If you have a computer with 32GB of RAM, you may be able to train using the UNITE database. If you cannot train locally for UNITE, the RDP files can be downloaded from here. The genus_wordConditionalProbList.txt.gz file should be gunzip-ped after downloading.