Scattering
Overview
Teaching: 0 min
Exercises: 0 minQuestions
Key question (FIXME)
Objectives
explain what is meant by the scatter pattern in workflow design, and how it differs from the similar concept of parallel execution
identify when the scatter pattern appears in a workflow description
(FIXME: does the above cover the intended meaning of the following two points from the lesson development sprint?); running the same program on each file; running the same program the same way except for one parameter
By the end of this episode, learners should be able to implement scattering of steps in a workflow.
TODO Add pictures of cross product / matrix manipulation
Exercise 1
What if you had two arrays, one a file array of bams and an array of chromosomes? How would you run all chromosomes on each bam?
Solution:
steps: GATK_HaplotypeCaller: run: GATK_HaplotypeCaller.cwl scatter: [intervals, input_bam] scatterMethod: flat_crossproduct
{. solution} {. challenge}
Exercise 2
How does this change the inputs and outputs for the workflow?
Solution:
cwlVersion: v1.0 class: Workflow requirements: ScatterFeatureRequirement: {} inputs: bam: File chromosomes: string[] outputs: HaplotypeCaller_VCFs: type: type: array items: type: array items: File outputSource: GATK_HaplotypeCaller/vcf steps: GATK_HaplotypeCaller: run: GATK_HaplotypeCaller.cwl scatter: [intervals, input_bam] scatterMethod: flat_crossproduct in: input_bam: bam intervals: chromosomes out: [vcf]
{. solution}
When scattering on multiple inputs,
you need to explicitly say how the scatter should occur.
There are 3 scatter methods in CWL:
dot_product
, flat_crossproduct
and nested_crossproduct
.
dot_product
is the default method,
which takes each element of the array and runs on each nth item of the array.
flat_crossproduct
and nested_crossproduct
will take both inputs and
run on every combination of both arrays.
The difference between flat and nested is in the output type.
Flat will create a single array output whereas
nested will create a nested array output.
Key Points
First key point. Brief Answer to questions. (FIXME)