DSL 2
Nextflow provides a syntax extension that allows the definition of module libraries and simplifies the writing of complex data analysis pipelines.
To enable this feature you need to define the following directive at the beginning of your workflow script:
nextflow.enable.dsl=2
Tip
As of version 22.03.0-edge
Nextflow defaults to DSL 2 if no version is specified explicitly.
You can restore the previous behavior setting in into your environment the following variable:
export NXF_DEFAULT_DSL=1
Note
As of version 22.03.0-edge
the DSL version specification (either 1 or 2) can also be specified in
the Nextflow configuration file using the same notation shown above.
Function
Nextflow allows the definition of custom functions in the workflow script using the following syntax:
def <function name> ( arg1, arg, .. ) {
<function body>
}
For example:
def foo() {
'Hello world'
}
def bar(alpha, omega) {
alpha + omega
}
The above snippet defines two simple functions, that can be invoked in the workflow script as foo()
which
returns the Hello world
string and bar(10,20)
which returns the sum of two parameters (30
in this case).
Note
Functions implicitly return the result of the last evaluated statement.
The keyword return
can be used to explicitly exit from a function and return the specified value.
For example:
def fib( x ) {
if( x <= 1 )
return x
else
fib(x-1) + fib(x-2)
}
Process
Process definition
The new DSL separates the definition of a process from its invocation. The process definition follows the usual
syntax as described in the process documentation. The only difference is that the
from
and into
channel declarations have to be omitted.
Then a process can be invoked as a function in the workflow
scope, passing the expected
input channels as parameters as if it were a custom function. For example:
nextflow.enable.dsl=2
process foo {
output:
path 'foo.txt'
script:
"""
your_command > foo.txt
"""
}
process bar {
input:
path x
output:
path 'bar.txt'
script:
"""
another_command $x > bar.txt
"""
}
workflow {
data = channel.fromPath('/some/path/*.txt')
foo()
bar(data)
}
Warning
A process component can be invoked only once in the same workflow context.
Process composition
Processes having matching input-output declaration can be composed so that the output of the first process is passed as input to the next process. Taking in consideration the previous example, it’s possible to write the following:
workflow {
bar(foo())
}
Process output
A process output can also be accessed using the out
attribute on the corresponding
process object. For example:
workflow {
foo()
bar(foo.out)
bar.out.view()
}
When a process defines two or more output channels, each of them can be accessed
using the array element operator e.g. out[0]
, out[1]
, etc. or using
named outputs (see below).
Process named output
The emit
option can be added to the process output definition to assign a name identifier. This name
can be used to reference the channel within the caller scope. For example:
process foo {
output:
path '*.bam', emit: samples_bam
'''
your_command --here
'''
}
workflow {
foo()
foo.out.samples_bam.view()
}
Process named stdout
The emit
option can be used also to name the stdout:
process sayHello {
input:
val cheers
output:
stdout emit: verbiage
script:
"""
echo -n $cheers
"""
}
workflow {
things = channel.of('Hello world!', 'Yo, dude!', 'Duck!')
sayHello(things)
sayHello.out.verbiage.view()
}
Workflow
Workflow definition
The workflow
keyword allows the definition of sub-workflow components that enclose the
invocation of one or more processes and operators:
workflow my_pipeline {
foo()
bar( foo.out.collect() )
}
For example, the above snippet defines a workflow component, named my_pipeline
, that can be invoked from
another workflow component definition as any other function or process with my_pipeline()
.
Workflow parameters
A workflow component can access any variable and parameter defined in the outer scope:
params.data = '/some/data/file'
workflow my_pipeline {
if( params.data )
bar(params.data)
else
bar(foo())
}
Workflow input
A workflow component can declare one or more input channels using the take
keyword. For example:
workflow my_pipeline {
take: data
main:
foo(data)
bar(foo.out)
}
Warning
When the take
keyword is used, the beginning of the workflow body must be identified with the
main
keyword.
Then, the input can be specified as an argument in the workflow invocation statement:
workflow {
my_pipeline( channel.from('/some/data') )
}
Note
Workflow inputs are always channels by definition. If a basic data type is provided instead, such as a number, string, list, etc, it is implicitly converted to a value channel.
Workflow output
A workflow component can declare one or more output channels using the emit
keyword. For example:
workflow my_pipeline {
main:
foo(data)
bar(foo.out)
emit:
bar.out
}
Then, the result of the my_pipeline
execution can be accessed using the out
property, i.e.
my_pipeline.out
. When multiple output channels are declared, use the array bracket notation
to access each output channel as described for the Process output definition.
Workflow named output
If the output channel is assigned to an identifier in the emit
declaration, such identifier can be used
to reference the channel within the caller scope. For example:
workflow my_pipeline {
main:
foo(data)
bar(foo.out)
emit:
my_data = bar.out
}
Then, the result of the above snippet can accessed using my_pipeline.out.my_data
.
Workflow entrypoint
A workflow definition which does not declare any name (also known as implicit workflow) is the entry point of execution for the workflow application.
Note
Implicit workflow definition is ignored when a script is included as a module. This allows the writing of a workflow script that can be used either as a library module or as an application script.
Tip
A different workflow entrypoint can be specified using the -entry
command line option.
Workflow composition
Workflows defined in your script or imported with Module inclusion can be invoked and composed as any other process in your application.
workflow flow1 {
take: data
main:
foo(data)
bar(foo.out)
emit:
bar.out
}
workflow flow2 {
take: data
main:
foo(data)
baz(foo.out)
emit:
baz.out
}
workflow {
take: data
main:
flow1(data)
flow2(flow1.out)
}
Note
Nested workflow execution determines an implicit scope. Therefore the same process can be
invoked in two different workflow scopes, like for example foo
in the above snippet that
is used both in flow1
and flow2
. The workflow execution path, along with the
process names, determines the fully qualified process name that is used to distinguish the
two different process invocations, i.e. flow1:foo
and flow2:foo
in the above example.
Tip
The fully qualified process name can be used as a valid process selector in the
nextflow.config
file and it has priority over the simple process name.
Modules
The new DSL allows the definition of module scripts that can be included and shared across workflow applications.
A module script (or simply, module) can contain the definition of functions, processes and workflows as described in the previous sections.
Note
Functions, processes and workflows are globally referred to as components.
Module inclusion
A component defined in a module script can be imported into another Nextflow script using the include
keyword.
For example:
include { foo } from './some/module'
workflow {
data = channel.fromPath('/some/data/*.txt')
foo(data)
}
The above snippet includes a process with name foo
defined in the module script in the main
execution context. This way, foo` can be invoked in the workflow
scope.
Nextflow implicitly looks for the script file ./some/module.nf
resolving the path
against the including script location.
Note
Relative paths must begin with the ./
prefix. Also, the include
statement must be defined outside of the workflow definition.
Module directory
As of version 22.10.0
, the module can be defined as a directory whose name matches the module name and
contains a script named main.nf
. For example:
some
└-module
└-main.nf
When defined as a directory the module needs to be included specifying the module directory path:
include { foo } from './some/module'
Module directories allows the use of module scoped binaries scripts. See Module binaries for details.
Multiple inclusions
A Nextflow script allows the inclusion of an arbitrary number of modules and components. When multiple components need to be included from the same module script, the component names can be specified in the same inclusion using the curly brackets notation as shown below:
include { foo; bar } from './some/module'
workflow {
data = channel.fromPath('/some/data/*.txt')
foo(data)
bar(data)
}
Module aliases
When including a module component, it’s possible to specify an alias with the as
keyword.
This allows the inclusion and the invocation of components with the same name
in your script using different names. For example:
include { foo } from './some/module'
include { foo as bar } from './other/module'
workflow {
foo(some_data)
bar(other_data)
}
The same is possible when including the same component multiple times from the same module script as shown below:
include { foo; foo as bar } from './some/module'
workflow {
foo(some_data)
bar(other_data)
}
Module parameters
A module script can define one or more parameters using the same syntax of a Nextflow workflow script:
params.foo = 'Hello'
params.bar = 'world!'
def sayHello() {
println "$params.foo $params.bar"
}
Then, parameters are inherited from the including context. For example:
params.foo = 'Hola'
params.bar = 'Mundo'
include {sayHello} from './some/module'
workflow {
sayHello()
}
The above snippet prints:
Hola Mundo
Note
The module inherits the parameters defined before the include
statement, therefore any further
parameter set later is ignored.
Tip
Define all pipeline parameters at the beginning of the script before any include
declaration.
The option addParams
can be used to extend the module parameters without affecting the external
scope. For example:
include {sayHello} from './some/module' addParams(foo: 'Ciao')
workflow {
sayHello()
}
The above snippet prints:
Ciao world!
Finally, the include option params
allows the specification of one or more parameters without
inheriting any value from the external environment.
Module templates
The module script can be defined in an external template file. With DSL2 the template file
can be placed under the templates
directory where the module script is located.
For example, let’s suppose to have a project L with a module script defining 2 processes (P1 and P2) and both use templates.
The template files can be made available under the local templates
directory:
Project L
|─myModules.nf
└─templates
|─P1-template.sh
└─P2-template.sh
Then, we have a second project A with a workflow that includes P1 and P2:
Pipeline A
└-main.nf
Finally, we have a third project B with a workflow that includes again P1 and P2:
Pipeline B
└-main.nf
With the possibility to keep the template files inside the project L, A and B can use the modules defined in L without any changes. A future prject C would do the same, just cloning L (if not available on the system) and including its module script.
Beside promoting sharing modules across pipelines, there are several advantages in keeping the module template under the script path:
module components are self-contained,
module components can be tested independently from the pipeline(s) importing them,
it is possible to create libraries of module components.
Ultimately, having multiple template locations allows a more structured organization within the same project. If a project has several module components, and all them use templates, the project could group module scripts and their templates as needed. For example:
baseDir
|─main.nf
└─Phase0-Modules
|─mymodules1.nf
|─mymodules2.nf
└─templates
|─P1-template.sh
|─P2-template.sh
└─Phase1-Modules
|─mymodules3.nf
|─mymodules4.nf
└─templates
|─P3-template.sh
└─P4-template.sh
└─Phase2-Modules
|─mymodules5.nf
|─mymodules6.nf
└─templates
|─P5-template.sh
|─P6-template.sh
└─P7-template.sh
Module binaries
As of version 22.10.0
, modules can define binary scripts that are locally scoped to the processes defined by the tasks.
To enable this feature add the following setting in pipeline configuration file:
nextflow.enable.moduleBinaries = true
The binary scripts must be placed in the module directory names <module-dir>/resources/usr/bin
:
<module-dir>
|─main.nf
└─resources
└─usr
└─bin
|─your-module-script1.sh
└─another-module-script2.py
Those scripts will be accessible as any other command in the tasks environment, provided they have been granted the Linux execute permissions.
Note
This feature requires the use of a local or shared file system as the pipeline work directory or Wave containers when using cloud based executors.
Channel forking
Using the new DSL, Nextflow channels are automatically forked when connecting two or more consumers.
For example:
channel
.from('Hello','Hola','Ciao')
.set{ cheers }
cheers
.map{ it.toUpperCase() }
.view()
cheers
.map{ it.reverse() }
.view()
The same is valid for the result (channel) of a process execution. Therefore a process output can be consumed by two or more processes without the need to fork it using the into operator, making the writing of workflow scripts more fluent and readable.
Pipes
The pipe operator
Nextflow processes and operators can be composed using the |
pipe operator. For example:
process foo {
input:
val data
output:
val result
exec:
result = "$data world"
}
workflow {
channel.from('Hello','Hola','Ciao') | foo | map { it.toUpperCase() } | view
}
The above snippet defines a process named foo
and invokes it passing the content of the
data
channel. The result is then piped to the map operator which converts each string
to uppercase and finally, the last view operator prints it.
The and operator
The &
and operator allows feeding of two or more processes with the content of the same
channel(s). For example:
process foo {
input:
val data
output:
val result
exec:
result = "$data world"
}
process bar {
input:
val data
output:
val result
exec:
result = data.toUpperCase()
}
workflow {
channel.from('Hello') | map { it.reverse() } | (foo & bar) | mix | view
}
In the above snippet the channel emitting the Hello
string is piped with the map
which reverses the string value. Then, the result is passed to both foo
and bar
processes which are executed in parallel. Each process outputs a channel, and the two channels are merged
into a single channel using the mix operator. Finally the result is printed
using the view operator.
Tip
The break-line operator \
can be used to split long statements over multiple lines.
The above snippet can also be written as:
workflow {
channel.from('Hello') \
| map { it.reverse() } \
| (foo & bar) \
| mix \
| view
}
DSL2 migration notes
DSL2 final version is activated using the declaration
nextflow.enable.dsl=2
in place ofnextflow.preview.dsl=2
.Process inputs of type
set
have to be replaced with tuple.Process outputs of type
set
have to be replaced with tuple.Process output option
mode flatten
is no longer available. Replace it using the flatten operator on the corresponding output channel.Anonymous and unwrapped includes are not supported anymore. Replace them with an explicit module inclusion. For example:
include './some/library' include bar from './other/library' workflow { foo() bar() }
Should be replaced with:
include { foo } from './some/library' include { bar } from './other/library' workflow { foo() bar() }
The use of unqualified value and file elements into input tuples is not allowed anymore. Replace them with a corresponding
val
orpath
qualifier:process foo { input: tuple X, 'some-file.bam' script: ''' your_command --in $X some-file.bam ''' }
Use:
process foo { input: tuple val(X), path('some-file.bam') script: ''' your_command --in $X some-file.bam ''' }
The use of unqualified value and file elements into output tuples is not allowed anymore. Replace them with a corresponding
val
orpath
qualifier:process foo { output: tuple X, 'some-file.bam' script: X = 'some value' ''' your_command > some-file.bam ''' }
Use:
process foo { output: tuple val(X), path('some-file.bam') script: X = 'some value' ''' your_command > some-file.bam ''' }
Operator bind has been deprecated by DSL2 syntax
Operator operator << has been deprecated by DSL2 syntax.
Operator choice has been deprecated by DSL2 syntax. Use branch instead.
Operator close has been deprecated by DSL2 syntax.
Operator create has been deprecated by DSL2 syntax.
Operator
countBy
has been deprecated by DSL2 syntax.Operator into has been deprecated by DSL2 syntax since it’s not needed anymore.
Operator
fork
has been renamed to multiMap.Operator
groupBy
has been deprecated by DSL2 syntax. Replace it with groupTupleOperator
print
andprintln
have been deprecated by DSL2 syntax. Use view instead.Operator separate has been deprecated by DSL2 syntax.
Operator spread has been deprecated with DSL2 syntax. Replace it with combine.
Operator
route
has been deprecated by DSL2 syntax.