EasyBuild Terminology¶
Over the years, we have come up with some terminology specific to EasyBuild to refer to particular components, which we use alongside established terminology relevant to the context of building and installing software on HPC systems.
It is important to be familiar with these terms, so we will briefly cover them one by one.
Framework¶
The EasyBuild framework consists of a set of Python modules organised in packages (easybuild.framework
,
easybuild.toolchains
, easybuild.tools
, etc.) that collectively form the core of EasyBuild,
and is developed in the easybuild-framework
repository on GitHub.
It implements the common functionality that you need when building software from source, and provides functions for unpacking source files, applying patch files, collecting the output produced by shell commands that are being run and checking their exit code, generating environment module files, etc.
The EasyBuild framework does not implement any specific installation procedure, it only provides the necessary functionality to facilitate this.
Easyblocks¶
An easyblock is a Python module that implements a specific software installation procedure, and can be viewed as a plugin to the EasyBuild framework.
Easyblocks can be either generic or software-specific.
A generic easyblock implements an installation procedure that can be used for
multiple different software packages. Commonly used examples include the ConfigureMake
easyblock
which implements the ubiquitous configure
/make
/make install
procedure, and the
PythonPackage
easyblock that can be used to install a Python package.
A software-specific easyblock implements an installation procedure that is specific to a particular software package.
Infamous examples include the easyblocks we have for GCC
, OpenFOAM
, TensorFlow
, WRF
, ...
The installation procedure performed by an easyblock can be controlled by defining easyconfig parameters in an easyconfig file.
A collection of (generic and software-specific) easyblocks is developed by the EasyBuild community
in the easybuild-easyblocks
repository on GitHub.
Easyconfig parameters¶
An easyconfig parameter specifies a particular aspect of a software installation that should be performed by EasyBuild.
Some easyconfig parameters are mandatory. The following parameters must be defined in every easyconfig file:
name
andversion
, which specify the name and version of the software to install;homepage
anddescription
, which provide key metadata for the software;toolchain
, which specifies the compiler toolchain to use to install the software (seetoolchains
section);
Other easyconfig parameters are optional: they can be used to provide required information, or to control specific aspects of the installation procedure performed by the easyblock.
Some commonly used optional easyconfig parameters include:
easyblock
, which specifies the (generic) easyblock that EasyBuild should use for the installation;sources
andsource_urls
, which specify the list of source files and where to download them;dependencies
andbuilddependencies
, which specify the list of (build) dependencies;configopts
,buildopts
, andinstallopts
, which specify options for the configuration/build/install commands, respectively;
If no value is specified for an optional easyconfig parameter, the corresponding default value will be used.
There are two groups of easyconfig parameters. General easyconfig parameters can be defined for any software package, and (usually) control a specific aspect of the installation. Custom easyconfig parameters are only supported by certain easyblocks, and only make sense for particular (types of) software.
Easyconfig files¶
Easyconfig files (or easyconfigs for short), are simple text files written in Python syntax that specify what EasyBuild should install. Each easyconfig file defines the set of easyconfig parameters that collectively form a complete specification for a particular software installation.
The filename of an easyconfig file usually ends with the .eb
extension.
In some contexts the filename is expected to correspond with the value of a handful of key
easyconfig parameters: name
, version
, toolchain
, and versionsuffix
. The general format for
the filename of an easyconfig file is: <name>-<version><toolchain><versionsuffix>.eb
,
where the toolchain part is omitted when the system
toolchain is used, and the <versionsuffix>
can be empty.
The filename of easyconfig files is particularly relevant when EasyBuild is searching for easyconfig files to resolve dependencies, since it does this purely based on filenames: interpreting the contents of every (potential) easyconfig file it encounters would be too time-consuming.
In the easybuild-easyconfigs
repository on GitHub,
the EasyBuild community maintains a large (and growing) collection of easyconfig files, for a wide range of
(scientific) software.
Easystack files¶
Easystack files are a relatively new concept in EasyBuild, providing a way to define a software stack that should be installed by EasyBuild.
They are written in YAML syntax, and include a list of software specifications which correspond to a list of easyconfig files, with support for providing specific EasyBuild configuration options for particular software packages, and including or excluding specific software packages based on labels.
The support for using easystack files is currently marked as experimental, which means that the implementation is considered to be incomplete, is subject to change in future EasyBuild releases, and may be prone to errors.
Extensions¶
Extensions is the collective term used in EasyBuild for additional software packages that can be installed on top of another software package. Common examples are Perl modules, Python packages, and R libraries.
As you can tell the common terminology here is a bit messy, so we came up with extensions as a unifying term.
Extensions can be installed in different ways:
- stand-alone, as a separate installation on top of one or more other installations;
- as a part of a bundle of extensions that collectively form a separate installation;
- or as an actual extension to a specific installation to yield a "batteries included" type of installation (for example installing Python bindings along with a C++ library);
Dependencies¶
A dependency is a common term in the context of software. It refers to a software package that is either strictly required by other software, or that can be leveraged to enhance other software (for example to support specific features).
There are three main types of software dependencies:
- a build dependency is only required when building/installing a software package;
once the software package is installed, it is no longer needed to use that software
(examples:
CMake
,pkg-config
); - a run-time dependency (often referred to simply as dependency) is a software package that is
required to use (or run) another software package (example:
Python
); - a link-time dependency is somewhere in between a build and runtime dependency:
it is only needed when linking a software package; it can become either a build or runtime
dependency, depending on exactly how the software is installed (example:
OpenBLAS
);
The distinction between link-time and run-time dependencies is mostly irrelevant for this tutorial, but we will discriminate between build and run-time dependencies.
Toolchains¶
A compiler toolchain (or just toolchain for short) is a set of compilers, which are used to build software from source, together with a set of additional libraries that provide further core functionality.
We refer to the different parts of a toolchain as toolchain components.
The compiler component of a toolchain typically consists of C, C++, and Fortran compilers in the context of HPC, but additional compilers (for example, a CUDA compiler for GPGPU software) can also be included.
Additional toolchain components are usually special-purpose libraries:
- an MPI library to support distributed computations (for example, Open MPI);
- libraries providing efficient linear algebra routines (BLAS, LAPACK);
- a library supporting computing Fast Fourier Transformations (for example, FFTW);
A toolchain that includes all of these libraries is referred to as a full toolchain, while a subtoolchain is a toolchain that is missing one or more of these libraries. A compiler-only toolchain only consists of compilers (no additional libraries).
System toolchain¶
The system
toolchain is a special case which corresponds to using the compilers and libraries
provided by the operating system, rather than using toolchain components that were installed using EasyBuild.
It is used sparingly, mostly to install software where no actual compilation is done, or to build a set of toolchain compilers and its dependencies, since the versions of the system compilers and libraries are beyond the control of EasyBuild, which could affect the reproducibility of the installation.
Common toolchains¶
The foss
and intel
toolchains are also known as the common toolchains,
because they are widely adopted by the EasyBuild community.
The foss
toolchain consists of all open source components (hence the name:
"FOSS" stands for Free & Open Source Software): GCC, Open MPI, FlexiBLAS with
OpenBLAS as default backend,
ScaLAPACK and FFTW.
The intel
toolchain consists of the Intel C, C++ and Fortran compilers (on top of a GCC
version
controlled through EasyBuild) alongside the Intel MPI and Intel MKL libraries.
Roughly every 6 months a new version of these common toolchains is agreed upon in the EasyBuild community, after extensive testing.
More information on these toolchains is available in the EasyBuild documentation.
Modules¶
Module is a massively overloaded term in (scientific) software and IT in general (kernel modules, Python modules, and so on). In the context of EasyBuild, the term 'module' usually refers to an environment module (file).
Environment modules is a well established concept on HPC systems: it is a way to specify changes that should be made to one or more environment variables in a shell-agnostic way. Environment module files are written in either Tcl or Lua syntax, and specify which environment variables should be updated, and how (append, prepend, set, unset, redefine, etc.) upon loading the environment module. Unloading an environment module will restore the shell environment to its previous state, by reverting the changes that were made to the environment when that environment module was loaded.
Environment module files are processed via a modules tool, of which there are several conceptually similar yet slightly different implementations. The most commonly used ones are the Tcl-based Environment Modules implementation, and Lmod, a more recent (and more popular) Lua-based implementation (which also supports module files written in Tcl syntax).
Environment module files are automatically generated for each software installation by EasyBuild, and loading a module results in changes being made to the environment of the current shell session such that the corresponding software installation can be used.
Bringing it all together¶
The EasyBuild framework leverages easyblocks to automatically build and install (scientific) software, potentially including additional extensions, using a particular compiler toolchain, as specified in easyconfig files which each define a set of easyconfig parameters.
EasyBuild ensures that the specified (build) dependencies are in place, and automatically generates a set of (environment) modules that facilitate access to the installed software.
An easystack file can be used to specify a collection of software to install with EasyBuild.
next: Installing EasyBuild - (back to overview page)