Workflows

In Nextflow, a workflow is a composition of processes and dataflow logic (i.e. channels and operators).

The workflow definition starts with the keyword workflow, followed by an optional name, and finally the workflow body delimited by curly braces. A basic workflow looks like the following example:

workflow {
    foo()
}

Where foo could be a function, a process, or another workflow.

Workflows are lazily executed, which means that Nextflow parses the entire workflow structure first, and then executes the entire workflow at once. The order in which a task is executed is determined only by its dependencies, so a task will be executed as soon as all of its required inputs are available.

The syntax of a workflow is defined as follows:

workflow [ name ] {

    take:
    < workflow inputs >

    main:
    < dataflow statements >

    emit:
    < workflow outputs >

}

Tip

The main: label can be omitted if there are no take: or emit: blocks.

Note

Workflows were introduced in DSL2. If you are still using DSL1, see the Migrating from DSL 1 page to learn how to migrate your Nextflow pipelines to DSL2.

Process invocation

A process can be invoked like a function in a workflow definition, passing the expected input channels like function arguments. For example:

process foo {
    output:
    path 'foo.txt'

    script:
    """
    your_command > foo.txt
    """
}

process bar {
    input:
    path x

    output:
    path 'bar.txt'

    script:
    """
    another_command $x > bar.txt
    """
}

workflow {
    data = channel.fromPath('/some/path/*.txt')
    foo()
    bar(data)
}

Warning

A process can be only be invoked once in a single workflow, however you can get around this restriction by using Module aliases.

Process composition

Processes with matching input/output declarations can be composed so that the output of the first process is passed as input to the second process. The previous example can be rewritten as follows:

workflow {
    bar(foo())
}

Process outputs

A process output can be accessed using the out attribute on the corresponding process object. For example:

workflow {
    foo()
    bar(foo.out)
    bar.out.view()
}

When a process defines multiple output channels, each output can be accessed using the array element operator (out[0], out[1], etc.) or using named outputs (see below).

The process output(s) can also be accessed like the return value of a function:

workflow {
    f_out = foo()
    (b1, b2) = bar(f_out)
    b1.view()
}

Process named outputs

The emit option can be added to the process output definition to assign a name identifier. This name can be used to reference the channel from the calling workflow. For example:

process foo {
    output:
    path '*.bam', emit: samples_bam

    '''
    your_command --here
    '''
}

workflow {
    foo()
    foo.out.samples_bam.view()
}

When referencing a named output directly from the process invocation, you can use a more concise syntax:

workflow {
    ch_samples = foo().samples_bam
}

Process named stdout

The emit option can also be used to name a stdout output:

process sayHello {
    input:
    val cheers

    output:
    stdout emit: verbiage

    script:
    """
    echo -n $cheers
    """
}

workflow {
    things = channel.of('Hello world!', 'Yo, dude!', 'Duck!')
    sayHello(things)
    sayHello.out.verbiage.view()
}

Note

Optional params for a process input/output are always prefixed with a comma, except for stdout. Because stdout does not have an associated name or value like other types, the first param should not be prefixed.

Subworkflows

A named workflow is a “subworkflow” that can be invoked from other workflows. For example:

workflow my_pipeline {
    foo()
    bar( foo.out.collect() )
}

workflow {
    my_pipeline()
}

The above snippet defines a workflow named my_pipeline, that can be invoked from another workflow as my_pipeline(), just like any other function or process.

Workflow parameters

A workflow component can access any variable or parameter defined in the global scope:

params.data = '/some/data/file'

workflow my_pipeline {
    if( params.data )
        bar(params.data)
    else
        bar(foo())
}

Workflow inputs

A workflow can declare one or more input channels using the take keyword. For example:

workflow my_pipeline {
    take: data

    main:
    foo(data)
    bar(foo.out)
}

Multiple inputs must be specified on separate lines:

workflow my_pipeline {
    take:
    data1
    data2

    main:
    foo(data1, data2)
    bar(foo.out)
}

Warning

When the take keyword is used, the beginning of the workflow body must be defined with the main keyword.

Inputs can be specified like arguments when invoking the workflow:

workflow {
    my_pipeline( channel.from('/some/data') )
}

Note

Workflow inputs are always channels by definition. If a basic data type, such as a number, string, list, etc, is provided, it is implicitly converted to a value channel.

Workflow outputs

A workflow can declare one or more output channels using the emit keyword. For example:

workflow my_pipeline {
    main:
    foo(data)
    bar(foo.out)

    emit:
    bar.out
}

When invoking the workflow, the output channel(s) can be accessed using the out property, i.e. my_pipeline.out. When multiple output channels are declared, use the array bracket notation or the assignment syntax to access each output channel as described for process outputs.

Workflow named outputs

If an output channel is assigned to an identifier in the emit block, the identifier can be used to reference the channel from the calling workflow. For example:

workflow my_pipeline {
    main:
    foo(data)
    bar(foo.out)

    emit:
    my_data = bar.out
}

The result of the above workflow can be accessed using my_pipeline.out.my_data.

Workflow entrypoint

A workflow with no name (also known as the implicit workflow) is the default entrypoint of the Nextflow pipeline. A different workflow entrypoint can be specified using the -entry command line option.

Note

Implicit workflow definitions are ignored when a script is included as a module. This way, a workflow script can be written in such a way that it can be used either as a library module or an application script.

Workflow composition

Named workflows can be invoked and composed just like any other process or function.

workflow flow1 {
    take: data
    main:
        foo(data)
        bar(foo.out)
    emit:
        bar.out
}

workflow flow2 {
    take: data
    main:
        foo(data)
        baz(foo.out)
    emit:
        baz.out
}

workflow {
    take: data
    main:
        flow1(data)
        flow2(flow1.out)
}

Note

Each workflow invocation has its own scope. As a result, the same process can be invoked in two different workflow scopes, like foo in the above snippet, which is used in both flow1 and flow2. The workflow execution path, along with the process names, determines the fully qualified process name that is used to distinguish the different process invocations, i.e. flow1:foo and flow2:foo in the above example.

Tip

The fully qualified process name can be used as a process selector in a Nextflow configuration file, and it takes priority over the simple process name.

Special operators

Pipe |

The | pipe operator can be used to compose Nextflow processes and operators. For example:

process foo {
    input:
    val data

    output:
    val result

    exec:
    result = "$data world"
}

workflow {
   channel.from('Hello','Hola','Ciao') | foo | map { it.toUpperCase() } | view
}

The above snippet defines a process named foo and invokes it with the data channel. The result is then piped to the map operator, which converts each string to uppercase, and finally to the view operator which prints it.

Tip

Statements can also be split across multiple lines for better readability:

workflow {
    channel.from('Hello','Hola','Ciao')
      | foo
      | map { it.toUpperCase() }
      | view
}

And &

The & and operator can be used to feed multiple processes with the same channel(s). For example:

process foo {
    input:
    val data

    output:
    val result

    exec:
    result = "$data world"
}

process bar {
    input:
    val data

    output:
    val result

    exec:
    result = data.toUpperCase()
}

workflow {
    channel.from('Hello')
      | map { it.reverse() }
      | (foo & bar)
      | mix
      | view
}

In the above snippet, the initial channel is piped to the map operator, which reverses the string value. Then, the result is passed to the processes foo and bar, which are executed in parallel. Each process outputs a channel, and the two channels are combined using the mix operator. Finally, the result is printed using the view operator.