Introduction#
Getting Singularity or Apptainer#
We do it as always, by using modules. Some systems may have Sing./Appt. in systems PATH or in an unusual place like somewhere on CVMFS.
which singularity
which apptainer
module spider apptainer
module spider singularityAssuming we have found any of the above, module load singularity or whatever we have found. Then, try executing it.
singularity versionTrying the favourite lolcow container#
The lolcow container is an example everyone uses to teach containers like Sing. Find it on a container repository. DockerHub, Redhat Quay.io and Sylabs Library are the usual places.
Use singularity pull to download the image.
Try the three entrypoint(s) : run, exec, and shell for the image.
Fallback: use the image from /home/shared/ on MagicCastle.
Genomics example 1: BWA Indexing#
Lets create a working directory chr20 and pull a Genome into it.
mkdir chr20
cd chr20
echo Download chr20 reference
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr20.fa.gz
gunzip chr20.fa.gz
ls -al
du -h chr20.faWe want to use BWA-MEM2 code to index the genome. Find a singularity image somewhere? Bioconductor and StaPH repositories on Quay.io are good places to start.
#apptainer pull docker://quay.io/biocontainers/bwa-mem2:2.3--he70b90d_0
#ls -lrt
# should get something like bwa-mem2_2.3--he70b90d_0.sif
#
ln -s /home/shared/sing/bwa-mem2_2.3--he70b90d_0.sif ./bwa-mem2_2.3--he70b90d_0.sifNow we have the image and can “exec” the code from inside container. Note that we’d want to bind-mount a particular directory the container expects! $PWD is the current directory.
apptainer exec --bind $PWD:/ref bwa-mem2_2.3--he70b90d_0.sif bwa-mem2 index /ref/chr20.faGenomics example 2: Following Google Deepvariant tutorual.#
https://github.com/google/deepvariant/blob/r1.6/docs/deepvariant-quick-start.md
Interestingly, it provides both Docker and Singularity instructions. We do not have enough GPUs so would need to use a batch job!
Fiest lets download the chr20 data for DeepVariant as per tutorial.
cd ~/scratch
INPUT_DIR="${PWD}/quickstart-testdata"
DATA_HTTP_DIR="https://storage.googleapis.com/deepvariant/quickstart-testdata"
mkdir -p ${INPUT_DIR}
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/NA12878_S1.chr20.10_10p1mb.bam
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/NA12878_S1.chr20.10_10p1mb.bam.bai
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/test_nist.b37_chr20_100kbp_at_10mb.bed
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/test_nist.b37_chr20_100kbp_at_10mb.vcf.gz
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/test_nist.b37_chr20_100kbp_at_10mb.vcf.gz.tbi
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.fai
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.gz
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.gz.fai
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/ucsc.hg19.chr20.unittest.fasta.gz.gziThen we would need the Singularity image. It is too large to download! Lets make a symbolic link. On Magic Castle
echo singularity pull docker://google/deepvariant:latest-gpu
echo singularity pull docker://google/deepvariant:latest
ln -s /home/shared/sing/deepvariant_latest-gpu.sif ./deepvariant_latest-gpu.sif
ln -s /home/shared/sing/deepvariant_latest.sif ./deepvariant_latest.sifLets use VI and save the following job script:
#!/bin/bash
#SBATCH --gpus=1 --partition=gpu-node --mem=0
#https://github.com/google/deepvariant/blob/r1.6/docs/deepvariant-quick-start.md
INPUT_DIR="${PWD}/quickstart-testdata"
OUTPUT_DIR="${PWD}/quickstart-output"
mkdir -p "${OUTPUT_DIR}"
# Pull the image.
# dont!
#apptainer pull docker://google/deepvariant:"${BIN_VERSION}"
#module load singularity
module load apptainer/1.4.5
# Run DeepVariant.
apptainer run --nv -B /usr/lib/locale/:/usr/lib/locale/ \
deepvariant_latest-gpu.sif \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref="${INPUT_DIR}"/ucsc.hg19.chr20.unittest.fasta \
--reads="${INPUT_DIR}"/NA12878_S1.chr20.10_10p1mb.bam \
--regions "chr20:10,000,000-10,010,000" \
--output_vcf="${OUTPUT_DIR}"/output.vcf.gz \
--output_gvcf="${OUTPUT_DIR}"/output.g.vcf.gz \
--intermediate_results_dir "${OUTPUT_DIR}/intermediate_results_dir" \
--num_shards=2
# --model_type=WGS # **Replace this string with exactly one of the following [WGS,WES,PACBIO,ONT_R104,HYBRID_PACBIO_ILLUMINA]**
# --num_shards=12 # **How many cores the `make_examples` step uses. Change it to the number of CPU cores you have.**
Then submit the script with sbatch command. Check if the $OUTPUT_DIR has the expected “variants”.
The tutorial on Github suggests to run another container to sanity check the results. No instructions for Singularity, but we can convert the Docker instruction.
module load apptainer/1.4.5
apptainer pull docker://jmcdani20/hap.py:v0.3.12
INPUT_DIR="${PWD}/quickstart-testdata"
OUTPUT_DIR="${PWD}/quickstart-output"
apptainer exec -B /usr/lib/locale/:/usr/lib/locale/ -B "${INPUT_DIR}":"/input" -B "${OUTPUT_DIR}:/output" hap.py_v0.3.12.sif /opt/hap.py/bin/hap.py /input/test_nist.b37_chr20_100kbp_at_10mb.vcf.gz /output/output.vcf.gz -f "/input/test_nist.b37_chr20_100kbp_at_10mb.bed" -r "/input/ucsc.hg19.chr20.unittest.fasta" -o "/output/happy.output" --engine=vcfeval --pass-only -l chr20:10000000-10010000 --threads=12
cat quickstart-output//happy.output.summary.csv