Downloading the UNITE databaseΒΆ
This tutorial is about how to obtain a reference database for classification of fungi or eukaryotes in general. These will be downloaded from UNITE.
For classification of fungi, we have had tested with the RepS 44343 General Release FASTA.
The eukaryote database with 96423 RepS sequences provides better information about the kingdom
classification of the sequence, but requires slightly more RAM (~40GB). Using the --high_level_taxonomy
option can provide a similar result
but with reduced RAM requirements.
curl https://files.plutof.ut.ee/public/orig/E7/28/E728E2CAB797C90A01CD271118F574B8B7D0DAEAB7E81193EB89A2AC769A0896.gz > sh_general_release_04.02.2020.tar.gz
tar -xzvf sh_general_release_04.02.2020.tar.gz
Use the FASTA called sh_general_release_fungi_35077_RepS_04.02.2020.fasta
within the expanded directory for
your fungal reference database, specified with -d
or --db
in your constax
command.
For the --high_level_db
option, the eukaryotes database found here https://plutof.ut.ee/#/doi/10.15156/BIO/1280127.
can be used. This will help to remove non-fungal OTUs from your dataset, or can be used as the main database (-d, --db
)
for projects amplifying other eukaryotes.