This lesson is still being designed and assembled (Pre-Alpha version)

Nextflow scripting

Overview

Teaching: 30 min
Exercises: 5 min
Questions
  • What language are Nextflow scripts written in?

  • How do I store values in a Nextflow script?

  • How do I write comments in a Nextflow script?

  • How can I store and retrieve multiple values?

  • How are strings evaluated in Nextflow?

  • How can I create simple re-useable code blocks?

Objectives
  • Understand what language Nextflow scripts are written in.

  • Define variables in a script.

  • Create lists of simple values.

  • Comment Nextflow scripts.

  • Explain what a list is.

  • Explain what string interpolation is.

  • Understand what a closure is.

Nextflow is a Domain Specific Language (DSL) implemented on top of the Groovy programming language, which in turn is a super-set of the Java programming language. This means that Nextflow can run any Groovy and Java code. It is not necessary to learn Groovy to use Nextflow DSL but it can be useful in edge cases where you need more functionality than the DSL provides.

Nextflow console

Nextflow has a console graphical interface. The console is a REPL (read-eval-print loop) environment that allows a user to quickly test part of a script or pieces of Nextflow code in an interactive manner.

It is a handy tool that allows a user to evaluate fragments of Nextflow/Groovy code or fast prototype a complete pipeline script. More information can be found here

We can use the command nextflow console to launch the interactive console to test out out Groovy code.

nextflow console

Console global scope

It is worth noting that the global script context is maintained across script executions. This means that variables declared in the global script scope are not lost when the script run is complete, and they can be accessed in further executions of the same or another piece of code.

Language Basics

Printing values

To print something is as easy as using the println method (println is a compression of “print line”) and passing the text to print in quotes. The text is referred to as a string as in a string of characters.

println("Hello, World!")
Hello, World!

Parenthesis for function invocations are optional. Therefore also the following is a valid syntax.

println "Hello, World!"
Hello, World!

Methods

println is a example of a Groovy method. A method is just a block of code which only runs when it is called. You can pass data, known as parameters, into a method using the method name followed by brackets (). Methods are used to perform certain actions, and they are also known as functions. Methods enable us to reuse code: define the code once, and use it many times.

Comments

When we write any code it is useful to document it using comments. In Nextflow comments use the same syntax as in the C-family programming languages. This can be confusing for people familiar with the # syntax for commenting in other languages.

// This is a single line comment. Everything after the // is ignored.

/*
   Comments can also
   span multiple
   lines.
 */

Variables

In any programming language, you need to use variables to store different types of information. A variable is a pointer to a space in the computer’s memory that stores the value associated with it.

Variables are assigned using = and can have any value. Groovy is dynamically-typed which means the variable’s data type is based on its value. For example, setting x = 1 means x is an integer number, but if it is later set to x = "hello" then it becomes a String.

Variable scope

When we create a variable using the x = 1 syntax we can access, (scope), it anywhere (globally) in the script. A variable declared in this fashion is sometimes called a public variable.

We can also define variables with a data type e.g. String x="Hello" or with the def keyword def x=1. This effects the accessibility (scope) of the variable. This is called lexical scoping (sometimes known as static scoping) that sets the scope of a variable so that it may only be accessed from within the block of code in which it is defined. A variable declared in this fashion is sometimes called a private variable.

Types of Data

Groovy knows various types of data. four common ones are:

A more complete list can be found here

In the example below, variable my_var has an integer value of 1:

//int − This is used to represent whole numbers.
my_var = 1

To create a variable with a floating point value, we can execute:

//float − This is used to represent floating point numbers.
my_var = 3.1499392

To create a Boolean value we assign the value true or false.
*Note: Do not enclose a Boolean value in quotes or they will be interpreted as a string.

//Boolean − This represents a Boolean value which can either be true or false.
my_var = false

And to create a string, we add single or double quotes around some text.

For example:

//String - These are text literals which are represented in the form of chain of characters
my_var = "chr1"

Multi-line strings

A block of text that span multiple lines can be defined by delimiting it with triple single ''' or double quotes """:

text = """
    This is a multi-line string
    using triple quotes.
    """

To display the value of a variable to the screen in Groovy, we can use the println method passing the variable name are a parameter.

x = 1
println(x)
1

Slashy strings

Strings can also be defined using the forward slash / character as delimiter. They are known as slashy strings and are useful for defining regular expressions and patterns, as there is no need to escape backslashes e.g \n specifies a new line. As with double quote strings they allow to interpolate variables prefixed with a $ character.

Try the following to see the difference:

x = /ATP1B2\TP53\WRAP53/
println(x)
ATP1B2\TP53\WRAP53
y = 'ATP1B2\TP53\WRAP53'
println(y)

Produces an error as the \ is a special characters that we need to escape.

// use \ to escape
y = 'ATP1B2\\TP53\\WRAP53'
println(y)
ATP1B2\TP53\WRAP53

String interpolation

To use a variable inside a single or multi-line double quoted string "" prefix the variable name with a $ to show it should be interpolated:

chr = "1"
println("processing chromosome $chr")
processing chromosome 1

Note: Variable names inside single quoted strings do not support String interpolation.

chr = "1"
println('processing chromosome $chr')
processing chromosome $chr

Lists

To store multiple values in a variable we can use a List. A List (also known as array) object can be defined by placing the list items in square brackets and separating items by commas ,:

kmers = [11,21,27,31]

You can access a given item in the list with square-bracket notation []. These positions are numbered starting at 0, so the first element has an index of 0.

kmers = [11,21,27,31]
println(kmers[0])

11

We can use negative numbers as indices in Groovy. They count from the end of the list rather than the front: the index -1 gives us the last element in the list, -2 the second to last, and so on. Because of this, kmers[3] and kmers[-1] point to the same element in our example list.

kmers = [11,21,27,31]
//Lists can also be indexed with negative indexes
println(kmers[3])
println(kmers[-1])
31
31

Lists can also be indexed using a range. A range is a quick way of declaring a list of consecutive sequential numbers. To define a range use <num1>..<num2> notation.

kmers = [11,21,27,31]
// The first three elements using a range.
println(kmer[0..2])
[11, 21, 27]

String interpolation of list elements

To use an expression like kmer[0..2] inside a double quoted String "" we use the ${expression} syntax, similar to Bash shell scripts.

For example, the expression below without the {}””

kmers = [11,21,27,31]
println("The first three elements in the Lists are. $kmers[0..2]")

would output.

The first three elements in the Lists are. [11, 21, 27, 31][0..2]

We need to enclose the kmers[0..2] expression inside ${} as below to get the correct output.

kmers = [11,21,27,31]
println("The first three elements in the Lists are. ${kmers[0..2]}")
The first three elements in the Lists are. [11, 21, 27]

List methods

Lists have a number of useful methods that can perform operations on their contents. See more here. When using a method on a type of object you need prefix the method with the variable name.

For example, in order to get the length of the list use the list size method:

mylist = [0,1,2]

println(mylist.size())

//inside a string need we need to use the ${} syntax
println("list size is:  ${mylist.size()}")
3
list size is:  3

We can use the getAt method to retrieve items in a list.

mylist = [0,1,2]
println mylist.getAt(1)
println mylist.getAt(-1)
println mylist.getAt([1..2])
1
2
[1, 2]

Listed below are a few more common list methods and their output on a simple example.

mylist = [1,2,3]
println mylist
println mylist + [1]
println mylist - [1]
println mylist * 2
println mylist.reverse()
println mylist.collect{ it+3 }
println mylist.unique().size()
println mylist.count(1)
println mylist.min()
println mylist.max()
println mylist.sum()
println mylist.sort()
println mylist.find{it%2 == 0}
println mylist.findAll{it%2 == 0}
[1, 2, 3]
[1, 2, 3, 1]
[2, 3]
[1, 2, 3, 1, 2, 3]
[3, 2, 1]
[4, 5, 6]
3
1
1
3
6
[1, 2, 3]
2
[2]

Create List and retrieve value

Create a list object list with the values 1 to 10. Access the fifth element in the list using with square-bracket notation or using the getAt method and print the results

Solution

list = [1,2,3,4,5,6,7,8,9,10]
//or
list = 1..10
println("${list[4]}")
//or
println("${list.getAt(4)}")

The fifth element is 5. Remember that the array index starts at 0.

Maps

It can difficult to remember the index of a value in a list, so we can use Groovy Maps (also known as associative arrays) that have an arbitrary type of key instead of an integer value. The syntax is very similar to Lists. To specify the key use a colon before the value [key:value]. Multiple values are separated by a comma. Note: the key value is not enclosed in quotes.

roi = [ chromosome : "chr17", start: 7640755, end: 7718054, genes: ['ATP1B2','TP53','WRAP53']]

Maps can be accessed in a conventional square-bracket syntax or as if the key was a property of the map or using the dot notation. Note: When retrieving a value the key value is enclosed in quotes.

//Use of the square brackets.
println(roi['chromosome'])

//Use a dot notation            
println(roi.start)

//Use of getAt method                      
println(roi.getAt('genes'))          

To add data or to modify a map, the syntax is similar to adding values to list:

//Use of the square brackets
roi['chromosome'] = '17'

//Use a dot notation        
roi.chromosome = 'chr17'  

//Use of put method              
roi.put('genome', 'hg38')  

More information about maps can be found in the Groovy API.

Closures

Closures are the swiss army knife of Nextflow/Groovy programming. In a nutshell a closure is a block of code that can be passed as an argument to a function. This can be useful to create a re-usable function.

We can assign a closure to a variable in same way as a value using the =.

square = { it * it }

The curly brackets {} around the expression it * it tells the script interpreter to treat this expression as code. it is an implicit variable that is provided in closures. It’s available when the closure doesn’t have an explicitly declared parameter and represents the value that is passed to the function when it is invoked.

We can pass the function square as an argument to other functions or methods. Some built-in functions take a function like this as an argument. One example is the collect method on lists that iterates through each element of the list transforming it into a new value using the closure:

square = { it * it }
x = [ 1, 2, 3, 4 ]
y = x.collect(square)
println y
[ 1, 4, 9, 16 ]

A closure can also be defined in an anonymous manner, meaning that it is not given a name, and is defined in the place where it needs to be used.

x = [ 1, 2, 3, 4 ]
y = x.collect({ it * it })
println("x is $x")
println("y is $y")
x is [1, 2, 3, 4]
y is [1, 4, 9, 16]

Closure parameters

By default, closures take a single parameter called it. To define a different name use the ` variable ->` syntax.

For example:

square = { num -> num * num }

In the above example the variable num is assigned as the closure input parameter instead of it.

Write a closure

Write a closure to add the prefix chr to each element of the list x=[1,2,3,4,5,6]

Solution

prefix = { "chr${it}"}
x = [ 1,2,3,4,5,6 ].collect(prefix)
println x
[chr1, chr2, chr3, chr4, chr5, chr6]

Multiple map parameters

It’s also possible to define closures with multiple, custom-named parameters using the -> syntax. This separate the custom-named parameters by a comma before the -> operator.

For example:

tp53 = [chromosome:"chr17", start:7661779, end:7687538, genome:'GRCh38', gene:"TP53"]
//perform subtraction of end and start coordinates
region_length = { start, end -> end-start }
tp53.length = region_length(tp53.start, tp53.end)
println(tp53)

Would add the region length to the map tp53, calculated as end - start.

[chromosome:chr17, start:7661779, end:7687538, genome:GRCh38, gene:TP53, length:25759]

For another example, the method each() when applied to a map can take a closure with two arguments, to which it passes the key-value pair for each entry in the map object:

//closure with two parameters
printMap = { a, b -> println "$a with value $b" }

//map object
my_map = [ chromosome : "chr17", start : 1, end : 83257441 ]

//each iterates through each element
my_map.each(printMap)
chromosome with value chr17
start with value 1
end with value 83257441

Learn more about closures in the Groovy documentation.

Additional Material

Conditional Execution

If statement

One of the most important features of any programming language is the ability to execute different code under different conditions. The simplest way to do this is to use the if construct.

The if statement uses the syntax common to other programming languages such Java, C, JavaScript, etc.

if( < boolean expression > ) {
    // true branch
}
else {
    // false branch
}

The else branch is optional. Curly brackets are optional when the branch defines just a single statement.

x = 12
if( x > 10 )
    println "$x is greater than 10"

null, empty strings and empty collections are evaluated to false. Therefore a statement like:

list = [1,2,3]
if( list != null && list.size() > 0 ) {
  println list
}
else {
  println 'The list is empty'
}

Can be written as:

if( list )
    println list
else
    println 'The list is empty'

In some cases can be useful to replace if statement with a ternary expression, also known as a conditional expression. For example:

println list ? list : 'The list is empty'

The previous statement can be further simplified using the Elvis operator ?: as shown below:

println list ?: 'The list is empty'

For statement

The classical for loop syntax is supported as shown here:

for (int i = 0; i <3; i++) {
   println("Hello World $i")
}

Iteration over list objects is also possible using the syntax below:

list = ['a','b','c']

for( String elem : list ) {
  println elem
}

Functions

It is possible to define a custom function into a script, as shown here:

int fib(int n) {
    return n < 2 ? 1 : fib(n-1) + fib(n-2)
}

println (fib(10)) // prints 89
def fact( n ) {
  n > 1 ? n * fact(n-1) : 1
}

println (fact(5)) // prints 120

More resources

The complete Groovy language documentation is available at this link.

A great resource to master Apache Groovy syntax is Groovy in Action.

Key Points

  • Nextflow is a Domain Specific Language (DSL) implemented on top of the Groovy programming language.

  • To define a variable, assign a value to it e.g., a = 1 .

  • Comments use the same syntax as in the C-family programming languages: // or multiline /* */.

  • Multiple values can be stored in lists [value1, value2, value3, …] or maps [chromosome: 1, start :1].

  • Lists are indexed and sliced with square brackets (e.g., list[0] and list[2..9])

  • String interpolation (variable interpolation, variable substitution, or variable expansion) is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values.

  • A closure is an expression (block of code) encased in {} e.g. { it * it }.