Introduction#
Getting Singularity or Apptainer#
We do it as always, by using modules. Some systems may have Sing./Appt. in systems PATH or in an unusual place like somewhere on CVMFS.
which singularity
module spider apptainer
module spider singularityAssuming we have found any of the above, module load singularity or whatever we have found. Then, try executing it.
singularity versionTrying the favourite lolcow container#
The lolcow container is an example everyone uses to teach containers like Sing. Find it on a container repository. DockerHub, Redhat Quay.io and Sylabs Library are the usual places.
Use singularity pull to download the image.
Try the three entrypoint(s) : run, exec, and shell for the image.
Fallback: use the image from /home/shared/ on MagicCastle.
Genomics example 1: BWA Indexing#
Lets create a working directory chr20 and pull a Genome into it.
mkdir chr20
cd chr20
echo Download chr20 reference
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr20.fa.gz
gunzip chr20.fa.gz
ls -al
du -h chr20.faWe want to use BWA-MEM2 code to index the genome. Find a singularity image somewhere? Bioconductor and StaPH repositories on Quay.io are good places to start.
singularity pull docker://quay.io/biocontainers/bwa-mem2:2.2.1--he513fc3_0
ls -lrt
# should get something like bwa-mem2_2.2.1--he513fc3_0.sif
#Now we have the image and can “exec” the code from inside container. Note that we’d want to bind-mount a particular directory the container expects! $PWD is the current directory.
singularity exec --bind $PWD:/ref bwa-mem2_2.2.1--he513fc3_0.sif bwa-mem2 index /ref/chr20.fa
# should see resulting files with ls -la if succesfulGenomics example 2: Following Google Deepvariant tutorual.#
https://github.com/google/deepvariant/blob/r1.6/docs/deepvariant-quick-start.md
Interestingly, it provides both Docker and Singularity instructions. We do not have enough GPUs so would need to use a batch job!
Fiest lets download the chr20 data for DeepVariant as per tutorial.
INPUT_DIR="${PWD}/quickstart-testdata"
DATA_HTTP_DIR="https://storage.googleapis.com/deepvariant/quickstart-testdata"
mkdir -p ${INPUT_DIR}
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/NA12878_S1.chr20.10_10p1mb.bam
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/NA12878_S1.chr20.10_10p1mb.bam.bai
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/test_nist.b37_chr20_100kbp_at_10mb.bed
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/test_nist.b37_chr20_100kbp_at_10mb.vcf.gz
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/test_nist.b37_chr20_100kbp_at_10mb.vcf.gz.tbi
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.fai
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.gz
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.gz.fai
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.gz.gziThen we would need the Singularity image. It is too large to download! Lets make a symbolic link. On Magic Castle
echo singularity pull docker://google/deepvariant:latest-gpu
echo singularity pull docker://google/deepvariant:latest
ln -s /home/shared/sing/deepvariant_latest-gpu.sif ./deepvariant_latest-gpu.sif
ln -s /home/shared/sing/deepvariant_latest.sif ./deepvariant_latest.sifLets use VI and save the following job script:
#!/bin/bash
#SBATCH --gpus=1
#SBATCH --partition=stamps-b,livi-b,mcordgpu-b,agro-b --reservation=ws_gpu
#SBATCH --cpus-per-task=12 --mem-per-cpu=3gb
#https://github.com/google/deepvariant/blob/r1.6/docs/deepvariant-quick-start.md
INPUT_DIR="${PWD}/quickstart-testdata"
OUTPUT_DIR="${PWD}/quickstart-output"
mkdir -p "${OUTPUT_DIR}"
# Pull the image.
# dont!
#singularity pull docker://google/deepvariant:"${BIN_VERSION}"
module load singularity
# Run DeepVariant.
singularity run --nv -B /usr/lib/locale/:/usr/lib/locale/ \
deepvariant_latest-gpu.sif \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref="${INPUT_DIR}"/ucsc.hg19.chr20.unittest.fasta \
--reads="${INPUT_DIR}"/NA12878_S1.chr20.10_10p1mb.bam \
--regions "chr20:10,000,000-10,010,000" \
--output_vcf="${OUTPUT_DIR}"/output.vcf.gz \
--output_gvcf="${OUTPUT_DIR}"/output.g.vcf.gz \
--intermediate_results_dir "${OUTPUT_DIR}/intermediate_results_dir" \
--num_shards=12
# --model_type=WGS # **Replace this string with exactly one of the following [WGS,WES,PACBIO,ONT_R104,HYBRID_PACBIO_ILLUMINA]**
# --num_shards=12 # **How many cores the `make_examples` step uses. Change it to the number of CPU cores you have.**
Then submit the script with sbatch command. Check if the $OUTPUT_DIR has the expected “variants”.
The tutorial on Github suggests to run another container to sanity check the results. No instructions for Singularity, but we can convert the Docker instruction.
singularity pull docker://jmcdani20/hap.py:v0.3.12
INPUT_DIR="${PWD}/quickstart-testdata"
OUTPUT_DIR="${PWD}/quickstart-output"
singularity exec -B /usr/lib/locale/:/usr/lib/locale/ -B "${INPUT_DIR}":"/input" -B "${OUTPUT_DIR}:/output" hap.py_v0.3.12.sif /opt/hap.py/bin/hap.py /input/test_nist.b37_chr20_100kbp_at_10mb.vcf.gz /output/output.vcf.gz -f "/input/test_nist.b37_chr20_100kbp_at_10mb.bed" -r "/input/ucsc.hg19.chr20.unittest.fasta" -o "/output/happy.output" --engine=vcfeval --pass-only -l chr20:10000000-10010000 --threads=12
cat quickstart-output//happy.output.summary.csv