This lesson is still being designed and assembled (Pre-Alpha version)

Customizing workflows

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Key question (FIXME)

Objectives
  • customize a workflow at any of the many levels

By the end of this episode, learners should be able to customize a workflow at any of the many levels:

  1. Change the input object
  2. Change the default values at the workflow level
  3. Add default values to existing inputs at the workflow level
  4. Change default value at the Workflow step level
  5. Add hard coded values (via default or valueFrom) at the Workflow step level
  6. Change hard coded values at the Workflow step level
  7. Change default values in the CLT description
  8. Change hard coded values in the CLT description
  9. Change the container (add helper script)
  10. Change the tool source itself

You’ve been given a workflow by your colleague that runs GATK HaplotypeCaller and must change various points to fix your needs.

Exercise 3:

In this workflow, add a default value for the reference in the inputs section.

cwlVersion: v1.0 class: Workflow inputs: bam: File chromosome: string reference: File

Solution:

cwlVersion: v1.0 class: Workflow inputs: bam: File chromosome: string reference: class: File default: type: File location: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

Exercise 5:

Default values in a workflow can be used at both the input object level and the step level. Add a default reference and chromosome inputs to the steps portion of the workflow. The requirement StepInputExpressionRequrirement must be declared in the requirements section to be able to add default values at the step level.

cwlVersion: v1.0 class: Workflow inputs: bam: File chromosome: string reference: File outputs: HaplotypeCaller_VCF: type: File outputSource: GATK_HaplotypeCaller/vcf steps: GATK_HaplotypeCaller: run: GATK_HaplotypeCaller.cwl in: input_bam: bam chromosome: chromosome reference: reference out: [vcf]

Solution:

cwlVersion: v1.0 class: Workflow requirements: StepInputExpressionRequirement: {} inputs: bam: File chromosome: string reference: File outputs: HaplotypeCaller_VCF: type: File outputSource: GATK_HaplotypeCaller/vcf steps: GATK_HaplotypeCaller: run: GATK_HaplotypeCaller.cwl in: input_bam: bam chromosome: default: chr1 reference: default: type: File location: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz out: [vcf]

Exercise 5:

Using docker images is a good way of creating reproducible workflows. When specifying DockerRequirement in the hints section of a workflow, you can use your own local images or images from a URL. Using dockerPull will grab a docker image from your local repository. Using dockerLoad will grab a Docker image using an HTTP URL.

In this workflow, use dockerPull to add a docker image called “broadinstitute/gatk4”

cwlVersion: v1.0 class: Workflow inputs: bam: File chromosome: string sample: string reference: File outputs: HaplotypeCaller_VCF: type: File outputSource: GATK_HaplotypeCaller/vcf steps: GATK_HaplotypeCaller: run: GATK_HaplotypeCaller.cwl in: input_bam: bam intervals: chromosome reference_fasta: reference out: [vcf]

Solution:

cwlVersion: v1.0 class: Workflow hints: DockerRequirement: dockerPull: broadinstitute/gatk4 inputs: bam: File chromosome: string sample: string reference: File outputs: HaplotypeCaller_VCF: type: File outputSource: GATK_HaplotypeCaller/vcf steps: GATK_HaplotypeCaller: run: GATK_HaplotypeCaller.cwl in: input_bam: bam intervals: chromosome reference_fasta: reference out: [vcf]

Now, every step of the workflow will use the same Docker container!

Key Points

  • First key point. Brief Answer to questions. (FIXME)