Skip to content

Terminology

[back: What is EasyBuild?]


Over the years, we have come up with some terminology specific to EasyBuild to refer to particular components, which we use alongside established terminology relevant to the context of building and installing software.

It is important to be familiar with these terms, so we'll briefly cover them one by one.


Framework

The EasyBuild framework consists of a set of Python modules organised in packages (easybuild.framework, easybuild.toolchains, easybuild.tools, etc.) that collectively form the core of EasyBuild, and is developed in the easybuild-framework repository on GitHub.

It implements the common functionality that you need when building software from source, providing functions for unpacking source files, applying patch files, collecting the output produced by shell commands that are being run and checking their exit code, generating environment module files, etc.

The EasyBuild framework does not implement any specific installation procedure, it only provides the necessary functionality to facilitate this.


Easyblocks

An easyblock is a Python module that implements a specific software installation procedure, and can be viewed as a plugin to the EasyBuild framework. Easyblocks can be either generic or software-specific.

A generic easyblock implements an installation procedure that can be used for multiple different software packages. Commonly used examples include the ConfigureMake easyblock which implements the ubiquitous configure-make-make install procedure, and the PythonPackage easyblock that can be used to install a Python package.

A software-specific easyblock implements an installation procedure that is specific to a particular software packages. Infamous examples include the easyblocks we have for GCC, OpenFOAM, TensorFlow, WRF, ...

The installation procedure performed by an easyblock can be controlled by defining easyconfig parameters in an easyconfig file.

A collection of (generic and software-specific) easyblocks is developed by the EasyBuild community in the easybuild-easyblocks repository on GitHub.


Easyconfig parameters

An easyconfig parameter specifies a particular aspect of a software installation that should be performed by EasyBuild.

Some easyconfig parameters are mandatory. The following parameters must be defined in every easyconfig file:

  • name and version, which specify the name and version of the software to install;
  • homepage and description, which provide key metadata for the software;
  • toolchain, which specifies the compiler toolchain to use to install the software (see toolchains tab);

Other easyconfig parameters are optional: they can be used to provide required information, or to control specific aspects of the installation procedure performed by the easyblock.

Some commonly used optional easyconfig parameters include:

  • easyblock, which specifies which (generic) easyblock should be used for the installation;
  • sources and source_urls, which specify the list of source files and where to download them;
  • dependencies and builddependencies, which specify the list of (build) dependencies;
  • configopts, buildopts, and installopts, which specify options for the configuration/build/install commands, respectively;

If no value is specified for an optional easyconfig parameter, the corresponding default value will be used.

There are two groups of easyconfig parameters. General easyconfig parameters can be defined for any software package, and (usually) control a specific aspect of the installation. Custom easyconfig parameters are only supported by certain easyblocks, and only make sense for particular (types of) software.


Easyconfig files

Easyconfig files (or easyconfigs for short), are simple text files written in Python syntax that specify what EasyBuild should install. Each easyconfig file defines the set of easyconfig parameters that collectively form a complete specification for a particular software installation.

The filename of an easyconfig file usually ends with the .eb extension. In some contexts the filename is expected to be determined by the value of a handful of key easyconfig parameters: name, version, toolchain and versionsuffix. The general format for the filename of an easyconfig file is: <name>-<version><toolchain>-<versionsuffix>.eb, where the toolchain part is omitted when a system toolchain is used, and the <versionsuffix> can be empty.

The filename of easyconfig files is particularly relevant when EasyBuild is searching for easyconfig files to resolve dependencies, since it does this purely based on filenames: interpreting the contents of every (potential) easyconfig file it encounters would be too expensive.

In the easybuild-easyconfigs repository on GitHub, the EasyBuild community maintains a large (and growing) collection of easyconfig files, for a wide range of (scientific) software.


Easystack files

Easystack files are a new concept in EasyBuild, providing a way to define a software stack that should be installed by EasyBuild.

They are written in YAML syntax, and include a list of software specifications which correspond to a list of easyconfig files, with support for providing specific EasyBuild configuration options for particular software packages, and including or excluding specific software packages based on labels.

The support for using easystack files is currently marked as experimental, which means it is subject to change in future EasyBuild releases, and may be prone to errors.


Extensions

Extensions is the collective term we use for additional software packages that can be installed on top of another software package. Common examples are Python packages, R libraries, and Perl modules.

As you can tell the common terminology here is a bit messy, so we came up with a unifying term...

Extensions can be installed in different ways:

  • stand-alone, as a separate installation on top of one or more other installations;
  • as a part of a bundle of extensions that collectively form a separate installation;
  • or as an actual extension to a specific installation to yield a "batteries included" type of installation (for examples by adding a bunch of Python packages from PyPI into a Python installation);

Dependencies

A dependency is a common term in the context of software. It refers to a software package that is either strictly required by other software, or that can be leveraged to enhance other software (for example to support specific features).

There are three main types of dependencies for computer software:

  • a build dependency is only required when building/installing a software package; once the software package is installed, it is no longer needed to use that software (examples: CMake, pkg-config);
  • a run-time dependency (often referred to simply as dependency) is a software package that is required to use (or run) another software package (example: Python);
  • a link-time dependency is somewhere in between a build and runtime dependency: it is only needed when linking a software package; it can become either a build or runtime dependency, depending on exactly how the software is installed (example: OpenBLAS);

The distinction between link-time and run-time dependencies is mostly irrelevant for this tutorial, but we will discriminate build-only dependencies.


Toolchains

A compiler toolchain (or just toolchain for short) is a set of compilers, which are used to build software from source, together with a set of additional libraries that provide further core functionality.

We refer to the different parts of a toolchain as toolchain components.

The compiler component typically consists of C, C++, and Fortran compilers in the context of HPC, but additional compilers (for example, a CUDA compiler for GPGPU software) can also be included.

Additional toolchain components are usually special-purpose libraries:

  • an MPI library to support distributed computations (for example, Open MPI);
  • libraries providing efficient linear algebra routines (BLAS, LAPACK);
  • a library supporting computing Fast Fourier Transformations (for example, FFTW);

A toolchain that includes all of these libraries is referred to as a full toolchain, while a subtoolchain is a toolchain that is missing one or more of these libraries. A compiler-only toolchain only consists of compilers (no additional libraries).

System toolchain

The system toolchain is a special case which corresponds to using the compilers and libraries provided by the operating system, rather than using toolchain components that were installed using EasyBuild.

It used sparingly, mostly to install software where no actual compilation is done or to build a set of toolchain compilers and its dependencies, since the versions of the system compilers and libraries are beyond the control of EasyBuild, which could affect the reproducibility of the installation.

Common toolchains

The foss and intel toolchains are also known as the common toolchains, because they are widely adopted by the EasyBuild community.

The foss toolchain consists of all open source components (hence the name: "FOSS" stands for Free & Open Source Software): GCC, Open MPI, OpenBLAS, ScaLAPACK and FFTW.

The intel toolchain consists of the Intel C, C++ and Fortran compilers (on top of a GCC version controlled through EasyBuild) alongside the Intel MPI and Intel MKL libraries.

Roughly every 6 months, a new version of these common toolchains is agreed upon in the EasyBuild community, after extensive testing.

More information on these toolchains is available in the EasyBuild documentation.


Modules

Module is a massively overloaded term in (scientific) software and IT in general (kernel modules, Python modules, and so on). In the context of EasyBuild, the term 'module' usually refers to an environment module (file).

Environment modules is a well established concept on HPC systems: it is a way to specify changes that should be made to one or more environment variables in a shell-agnostic way. A module file is usually written in either Tcl or Lua syntax, and specifies which environment variables should be updated, and how (append, prepend, (re)define, undefine, etc.) upon loading the environment module. Unloading the environment module will restore the shell environment to its previous state.

Environment module files are processed via a modules tool, of which there are several conceptually similar yet slightly different implementations. The Tcl-based Environment Modules implementation, and Lmod, a more recent Lua-based implementation (which also supports module files written in Tcl syntax), are the most commonly used ones.

Environment module files are automatically generated for each software installation by EasyBuild, and loading a module results in changes being made to the environment of the current shell session such that the corresponding software installation can be used.


Bringing it all together

The EasyBuild framework leverages easyblocks to automatically build and install (scientific) software, potentially including additional extensions, using a particular compiler toolchain, as specified in easyconfig files which each define a set of easyconfig parameters.

EasyBuild ensures that the specified (build) dependencies are in place, and automatically generates a set of (environment) modules that facilitate access to the installed software.

An easystack file can be used to specify a collection of software to install with EasyBuild.


[next: Installation]


Last update: April 21, 2022