Workflow parameterization
Overview
Teaching: 20 min
Exercises: 5 minQuestions
How can I change the data a workflow uses?
How can I parameterise a workflow?
How can I add my parameters to a file?
Objectives
Use pipeline parameters to change the input to a workflow.
Add a pipeline parameters to a Nextflow script.
Understand how to create and use a parameter file.
In the first episode we ran the Nextflow script, wc.nf
, from the command line and it counted the number of lines in the file
data/yeast/reads/ref1_1.fq.gz
. To change the input to script we can make use of pipeline parameters.
Pipeline parameters
The Nextflow wc.nf
script defines a pipeline parameter params.input
.
Pipeline parameters enable you to change the input to the workflow at
runtime, via the command line or a configuration file, so they are not
hard-coded into the script.
Pipeline parameters are declared in the workflow by prepending the prefix
params
, separated by the dot character, to a variable name e.g.,
params.input
.
Their value can be specified on the command line by
prefixing the parameter name with a double dash character, e.g., --input
.
In the script wc.nf
the pipeline parameter params.input
was specified with a value of "data/yeast/reads/ref1_1.fq.gz"
.
To process a different file, e.g. data/yeast/reads/ref2_2.fq.gz
, in the wc.nf
script we would run:
nextflow run wc.nf --input 'data/yeast/reads/ref2_2.fq.gz'
N E X T F L O W ~ version 21.04.0
Launching `wc.nf` [gigantic_woese] - revision: 8acb5cb9b0
executor > local (1)
[26/3cf986] process > NUM_LINES (1) [100%] 1 of 1 ✔
ref2_2.fq.gz 81720
We can also use wild cards to specify multiple input files (This will be covered in the channels episode).
In the example below we use the *
to match any sequence of characters between ref2_
and .fq.gz
.
Note: If you use wild card characters on the command line you must enclose the value in quotes.
$ nextflow run wc.nf --input 'data/yeast/reads/ref2_*.fq.gz'
This runs the process NUM_LINES twice, once for each file it matches.
N E X T F L O W ~ version 21.04.0
Launching `wc.nf` [tender_lumiere] - revision: 8acb5cb9b0
executor > local (2)
[cc/b6f793] process > NUM_LINES (1) [100%] 2 of 2 ✔
ref2_2.fq.gz 81720
ref2_1.fq.gz 81720
Change a pipeline’s input using a parameter
Re-run the Nextflow script
wc.nf
by changing the pipeline input to all files in the directorydata/yeast/reads/
that begin withref
and end with.fq.gz
:Solution
$ nextflow run wc.nf --input 'data/yeast/reads/ref*.fq.gz'
The string specified on the command line will override the default value of the parameter in the script. The output will look like this:
N E X T F L O W ~ version 20.10.0 Launching `wc.nf` [soggy_miescher] - revision: c54a707593 executor > local (6) [d3/9ca185] process > NUM_LINES (2) [100%] 6 of 6 ✔ ref3_2.fq.gz 52592 ref2_2.fq.gz 81720 ref1_1.fq.gz 58708 ref1_2.fq.gz 58708 ref3_1.fq.gz 52592 ref2_1.fq.gz 81720
The pipeline executes the
NUM_LINES
process 6 times; one process for each file matching the stringdata/yeast/reads/*.fq.gz
. Since each process is executed in parallel, there is no guarantee of which output is reported first. When you run this script, you may see the process output in a different order.
Adding a parameter to a script
To add a pipeline parameter to a script prepend the prefix params
, separated by a dot character .
, to a variable name e.g.,
params.input
.
Let’s make a copy of the wc.nf
script as wc-params.nf
and add a new input parameter.
$ cp wc.nf wc-params.nf
To add a parameter sleep
with the default value 2
to wc-params.nf
we add the line:
params.sleep = 2
Note: You should always add a sensible default value to the pipeline parameter. We can use this parameter to add another step to our NUM_LINES
process.
script:
"""
sleep ${params.sleep}
printf '${read} '
gunzip -c ${read} | wc -l
"""
This step, sleep ${params.sleep}
, will add a delay for the amount of time specified in the params.sleep
variable,
by default 2 seconds.
To access the value inside the script block we use {variable_name}
syntax e.g. ${params.sleep}
.
We can now change the sleep parameter from the command line, For Example:
nextflow run wc-params.nf --sleep 10
Add a pipeline parameter
If you haven’t already make a copy of the
wc.nf
aswc-params.nf
.$ cp wc.nf wc-params.nf
Add the param
sleep
with a default value of 2 below theparams.input
line. Add the linesleep ${params.sleep}
in the processNUM_LINES
above the line printf ‘${read}.Run the new script
wc-params.nf
changing the sleep input time.What input file would it run and why?
How would you get it to process all
.fq.gz
files in thedata/yeast/reads
directory as well as changing the sleep input to 1 second?Solution
params.sleep=2
script: """ sleep ${params.sleep} printf '${read} ' gunzip -c ${read} | wc -l """
$ nextflow run wc-params.nf --sleep 1
This would use 1 as a value of
sleep
parameter instead of default value (which is 2) and run the pipeline. The input file would bedata/yeast/reads/ref1_1.fq.gz
as this is the default. To run all input files we could add the param--input 'data/yeast/reads/*.fq.gz'
$ nextflow run wc-params.nf --sleep 1 --input 'data/yeast/reads/*.fq.gz'
Parameter File
If we have many parameters to pass to a script it is best to create a parameters file.
Parameters are stored in JSON or YAML format. JSON and YAML are data serialization languages, that are a way of storing data objects and structures, such as the params
object in a file.
The -params-file
option is used to pass the parameters file to the script.
For example the file wc-params.json
contains the parameters sleep
and input
in JSON format.
{
"sleep": 5,
"input": "data/yeast/reads/etoh60_1*.fq.gz"
}
To run the wc-params.nf
script using these parameters we add the option -params-file
and pass the file wc-params.json
:
$ nextflow run wc-params.nf -params-file wc-params.json
N E X T F L O W ~ version 21.04.0
Launching `wc-params.nf` [nostalgic_northcutt] - revision: 2f86c9ac7e
executor > local (2)
[b4/747eaa] process > NUM_LINES (1) [100%] 2 of 2 ✔
etoh60_1_2.fq.gz 87348
etoh60_1_1.fq.gz 87348
Create and use a Parameter file.
Create a parameter file
params.json
for the Nextflow filewc-params.nf
, and run the Nextflow script using the created parameter file, specifying:
- sleep as 10
- input as
data/yeast/reads/ref3_1.fq.gz
Solution
{ "sleep": 10, "input": "data/yeast/reads/ref3_1.fq.gz" }
$ nextflow run wc-params.nf -params-file params.json
N E X T F L O W ~ version 21.04.0 Launching `wc-params.nf` [small_wiles] - revision: f5ef7b7a01 executor > local (1) [f3/4fa480] process > NUM_LINES (1) [100%] 1 of 1 ✔ ref3_1.fq.gz 52592
Key Points
Pipeline parameters are specified by prepending the prefix
params
to a variable name, separated by dot character.To specify a pipeline parameter on the command line for a Nextflow run use
--variable_name
syntax.You can add parameters to a JSON or YAML formatted file and pass them to the script using option
-params-file
.