Run CONSTAX on HPCCΒΆ

To run CONSTAX on the high performance cluster computer or HPCC available at Michigan State University, you can set the paths just using --msu_hpcc flag to your constax.sh file

The code will look like as below

_images/msu_hpcc.png
#!/bin/bash --login

#SBATCH --time=10:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=32G
#SBATCH --job-name constax_fungi
#SBACTH -A shade-cole-bonito

cd ${SLURM_SUBMIT_DIR}

conda activate py3

constax \
--num_threads $SLURM_CPUS_PER_TASK \
--mem $SLURM_MEM_PER_NODE \
--db /mnt/home/benucci/DATABASES/sh_general_release_fungi_35077_RepS_04.02.2020.fasta \
--train \
--trainfile /mnt/home/benucci/CONSTAX_v2/tutorial/training_files_fungi/ \
--input /mnt/home/benucci/CONSTAX_v2/tutorial/ITS1_soil_500_otu.fasta \
--isolates /mnt/home/benucci/CONSTAX_v2/tutorial/isolates.fasta \
--isolates_query_coverage=97 \
--isolates_percent_identity=97 \
--high_level_db /mnt/home/benucci/DATABASES/sh_general_release_fungi_35077_RepS_04.02.2020.fasta \
--high_level_query_coverage=85 \
--high_level_percent_identity=60 \
--tax /mnt/home/benucci/CONSTAX_v2/tutorial/taxonomy_assignments_fungi07/ \
--output /mnt/home/benucci/CONSTAX_v2/tutorial/taxonomy_assignments_fungi07/ \
--conf 0.7 \
--blast \
--msu_hpcc \
--make_plot

conda deactivate

scontrol show job $SLURM_JOB_ID

Note

As you can see this time constax.sh does not contain the --train option,

since the reference database has been already trained it is not required any additional training. This will improve the speed and therefore the running time will be less. The resources you need to compute just the classification are much less that those needed for training. You can then set the num_threads option to a lower number as well as the amount of RAM --mem.

Additionally no --isolates is provided in this run of CONSTAX and the --hpcc_msu is specified at the end of the script.

To access some other representative OTU sequences files please follow THIS link. These are the available files.

_images/otu_files.png