This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Bioinformatics workflows with Nextflow and nf-core

Nextflow is workflow management software which enables the writing of scalable and reproducible scientific workflows. It can integrate various software package and environment management systems such as Docker, Singularity, and Conda. It allows for existing pipelines written in common scripting languages, such as R and Python, to be seamlessly coupled together. It implements a Domain Specific Language (DSL) that simplifies the implementation and running of workflows on cloud or high-performance computing (HPC) infrastructures.

This lesson also introduces nf-core: a community-driven platform, which provide peer reviewed best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as development tools for building and sharing reproducible data science workflows.

lesson objectives

  1. The learner will understand the fundamental components of a Nextflow script, including channels, processes and operators.
  2. The learner will write a multi-step workflow script to align, quantify, and perform QC on an RNA-Seq data in Nextflow DSL.
  3. The learner will be able to write a Nextflow configuration file to alter the computational resources allocated to a process.
  4. The learner will use nf-core to run a community curated pipeline, on an RNA-Seq dataset.

Prerequisites

This is an intermediate lesson and assumes familiarity with the core materials covered in the Software Carpentry Lessons. In particular learners need to be familiar with material covered in The Unix Shell, and either Plotting and Programming in Python or R for Reproducible Scientific Analysis.

Schedule

Setup Download files required for the lesson
00:00 1. Getting Started with Nextflow What is a workflow and what are workflow management systems?
Why should I use a workflow management system?
What is Nextflow?
What are the main features of Nextflow?
What are the main components of a Nextflow script?
How do I run a Nextflow script?
00:40 2. Nextflow scripting What language are Nextflow scripts written in?
How do I store values in a Nextflow script?
How do I write comments in a Nextflow script?
How can I store and retrieve multiple values?
How are strings evaluated in Nextflow?
How can I create simple re-useable code blocks?
01:15 3. Workflow parameterization How can I change the data a workflow uses?
How can I parameterise a workflow?
How can I add my parameters to a file?
01:40 4. Channels How do I get data into Nextflow?
How do I handle different types of input, e.g. files and parameters?
How do I create a Nextflow channel?
How can I use pattern matching to select input files?
How do I change the way inputs are handled?
02:20 5. Processes How do I run tasks/processes in Nextflow?
How do I get data, files and values, into a processes?
03:05 6. Processes Part 2 How do I get data, files, and values, out of processes?
How do I handle grouped input and output?
How can I control when a process is executed?
How do I control resources, such as number of CPUs and memory, available to processes?
How do I save output/results from a process?
03:45 7. Workflow How do I connect channels and processes to create a workflow?
How do I invoke a process inside a workflow?
04:30 8. Operators How do I perform operations, such as filtering, on channels?
What are the different kinds of operations I can perform on channels?
How do I combine operations?
How can I use a CSV file to process data into a channel?
05:10 9. Nextflow configuration What is the difference between the workflow implementation and the workflow configuration?
How do I configure a Nextflow workflow?
How do I assign different resources to different processes?
How do I separate and provide configuration for different computational systems?
How do I change configuration settings from the default settings provided by the workflow?
05:55 10. Simple RNA-Seq pipeline How can I create a Nextflow pipeline from a series of unix commands and input data?
How do I log my pipelines parameters?
How can I manage my pipeline software requirement?
How do I know when my pipeline has finished?
How do I see how much resources my pipeline has used?
06:55 11. Modules How can I reuse a Nextflow process in different workflows?
How do I use parameters in a module?
07:40 12. Sub-workflows How do I reuse a workflow as part of a larger workflow?
How do I run only a part of a workflow?
08:00 13. Reporting How do I get information about my pipeline run?
How can I see what commands I ran?
How can I create a report from my run?
08:25 14. Workflow caching and checkpointing How can I restart a Nextflow workflow after an error?
How can I add new data to a workflow?
Where can I find intermediate data and results?
09:05 15. Deploying nf-core pipelines Where can I find existing bioinformatic pipelines?
How do I run nf-core pipelines?
How do I configure nf-core pipelines to use my data?
How do I reference nf-core pipelines?
09:45 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.