DOKK Library

Modern C++ Programming Techniques for Scientific Computing

Authors Dr. Ole Klein

License CC-BY-SA-4.0

Plaintext
Modern C++ Programming Techniques
      for Scientific Computing
            Innovations from C++11 to C++20


                        Dr. Ole Klein
 Interdisciplinary Center for Scientific Computing (IWR)
                   Heidelberg University
           email: ole.klein@iwr.uni-heidelberg.de

                Winter Semester 2020/21
License and Copyright
                        This work is licensed under a Creative Commons Attribution-
                        ShareAlike 4.0 International License (CC BY-SA).

Unless where otherwise noted, these lecture notes are:

Text, code snippets and illustrations © 2021 Ole Klein

Several code snippets from the Dune project and others are used under U.S. fair use rules
and/or European limitations and exceptions to copyright, reliant on the fact that these lecture
notes are a document for nonprofit educational purposes. These snippets, clearly attributed in
each case, are expressly excluded from the CC license specified above. Technically, it is in your
responsability to ensure that you are allowed to copy these sections under your jurisdiction.

If you would like to reference these lecture notes in your own scientific works, please cite as
“Klein, O., 2021: Modern C++ Programming Techniques for Scientific Computing, Lecture
Notes, Heidelberg University” including the link under which this document can be found
online:
                 https://conan.iwr.uni-heidelberg.de/people/oklein/
                                                                                                                                                              Contents


Contents


1. Introduction                                                                                                                                                            3
   1.1. Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                            3
   1.2. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                              4
   1.3. Why C++? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                               4

2. Fundamental Concepts of C++                                                                                                                                            10
   2.1. Variables and Data Types .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
   2.2. Pointers and References . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
   2.3. Control Structures . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
   2.4. Functions . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
   2.5. Templates . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
   2.6. Classes . . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
   2.7. Inheritance . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   34
   2.8. Code Structure . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   39

3. The    Standard Library                                                                                                                                                42
   3.1.   Input / Output . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42
   3.2.   Container Classes .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44
   3.3.   Iterators . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   48
   3.4.   Algorithms . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   49
   3.5.   Companion Classes       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   50
   3.6.   Exceptions . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   50

4. Advanced Topics                                                                                                                                                        52
   4.1. Template Specialization . . . . . . . . . . . .                                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   52
   4.2. Templates and Inheritance . . . . . . . . . . .                                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   56
   4.3. RAII: Resource Acquisition is Initialization .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   59
   4.4. Template Metaprogramming . . . . . . . . . .                                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   62
   4.5. Dynamic Polymorphism . . . . . . . . . . . .                                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   64
   4.6. Static Polymorphism . . . . . . . . . . . . . .                                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   71
   4.7. SFINAE: Substitution Failure is not an Error                                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   73

5. C++11 Features                                                                                                                                                          79
   5.1. Automatic Type Deduction . . . . .                                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    79
   5.2. Compile-Time Evaluation . . . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    81
   5.3. Move Semantics . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    82
   5.4. Smart Pointers . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    83
   5.5. Lambda Expressions (Closures) . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    90
   5.6. Variadic Templates . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    92
   5.7. Concurrency with Threads . . . . . .                              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    94
   5.8. Assorted New Features and Libraries                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   103



                                                                                                                                                                           1
Contents


6. C++14 Features                                                                                                                                          112
   6.1. Improvements to Constant Expressions                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   112
   6.2. Generic Lambda Expressions . . . . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   113
   6.3. Variable Templates . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   113
   6.4. Return Type Deduction . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   114
   6.5. Other Improvements and Additions . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   115

7. C++17 Features                                                                                                                                          117
   7.1. Guaranteed Copy Elision . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   117
   7.2. Compile-Time Branches . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   119
   7.3. Structured Bindings . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   120
   7.4. Utility Classes . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   121
   7.5. Fold Expressions . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   122
   7.6. Improvements for Templates . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   122
   7.7. Mathematical Special Functions         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   123
   7.8. Filesystems Library . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   124

8. C++20 Features                                                                                                                                          125
   8.1. Modules . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   125
   8.2. Concepts . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   126
   8.3. Ranges . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   128
   8.4. Concurrency with Coroutines        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   128
   8.5. Mathematical Constants . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   130
   8.6. Text Formatting . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   130

A. Exercises                                                                                  132
   A.1. Fundamentals and Standard Library . . . . . . . . . . . . . . . . . . . . . . . . 132
   A.2. Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
   A.3. Modern C++ Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149




2
1. Introduction

1.1. Preface

The following pages present the content of a lecture called “Object-Oriented Programming
for Scientific Computing” that I have given regularly at Heidelberg University for over half a
decade. While earlier iterations of this lecture where indeed geared towards object-oriented
programming, in the last few years the focus has more and more shifted towards a comprehen-
sive introduction to scientific programming techniques in C++, which may be of interest to a
larger audience. I have therefore decided to make these teaching materials available online in
the form of an electronic document, hoping that they will prove useful.


In the last few years, the C++ language has undergone several shifts in how the language is
typically written: while C++98/03 code can quite often look strikingly like C code, with C-style
arrays, enums, and explicit allocations and deallocations, code based on the upcoming C++20
standard can be quite similar to some other popular modern languages like, say, Python. Most
importantly, the old way of writing C++ remains perfectly valid, meaning that C++ is a rather
flexible language that can be adapted to a variety of coding styles and layout preferences.


The strong focus of C++ on runtime performance, portability and flexibility makes it quite
suitable for scientific computing and high-performance computing. Unfortunately, the language
has a reputation for excessive verbosity and complexity, mainly because template-based code
is not only quite complex and longwinded itself, but additionally tends to produce compilation
error messages which are exceedingly long and difficult to understand. Modern C++ tech-
niques, however, can often alleviate this problem, and lead to much clearer and more concise
code. It therefore certainly makes sense to discuss these new additions in detail. Nevertheless,
we first discuss older techniques, mainly for two reasons: first, to provide adequate context for
the more modern constructs, and second, because these superseded approaches remain ubiqui-
tuous in legacy code bases, and knowing about them is therefore important for anyone working
on such projects.


The following sections are based on my LATEX beamer lecture slides, and are set using the
beamerarticle companion document class. This has the huge advantage that the content
will, hopefully, stay in sync with updates to the lecture slides, but has the drawback that the
layout may sometimes be less than optimal. I have strived for exactness in all the presented
topics, but have sometimes chosen to slightly simplify matters to avoid overly verbose and
pedantic formulations. Corrections, additions, and suggestions are more than welcome —
please do not hesitate to let me know about potential ways to improve these materials.


  — Ole Klein, Heidelberg, Winter Semester 2020/21



                                                                                               3
1. Introduction


1.2. Acknowledgments

This document doesn’t exist in a vacuum, instead it references and makes use of several online
resources, among them:

    • The C++ Super-FAQ at isocpp.org/faq

    • The C++ Reference at en.cppreference.com

    • Herb Sutter’s websites: gotw.ca / herbsutter.com

    • Matt Godbolt’s Compiler Explorer: godbolt.org

    • Jean Guegant’s blog at jguegant.github.io/blogs/tech/

    • Anders Schau Knatten’s C++ Quiz at cppquiz.org

I’d like to use this opportunity to thank the authors and contributors of the websites referenced
within this document for their efforts in creating and maintaining these online resources.


Several real-world code examples from the Dune project, dune-project.org, and PDELab sub-
project, dune-project.org/modules/dune-pdelab, are discussed within this document. Dune,
the Distributed and Unified Numerics Environment, is a set of open-source C++ libraries for
Scientific Computing, licensed under the GPL. I’d like to acknowledge the efforts of the con-
tributors of the Dune project, and thank them for the opportunity to use their code base
to demonstrate real-world applications of some of the concepts that are discussed in these
notes.


1.3. Why C++?

A (non-exhaustive) list of programming languages for scientific computing:

Fortran (1957) old (think punchcards) but still very relevant today (e.g., numerical linear
     algebra, legacy physics/astronomy codes)

C (1972) widespread language for high-performance low-level programming (operating sys-
     tems, compilers, . . . )

C++ (1985) started as “C with classes”, newer iterations favor a style that is closer to, e.g.,
   Python

Python (1990) popular language with large ecosystem of numerical software libraries

Julia (2012) relatively new language specifically designed for high-performance scientific com-
      puting



4
                                                                              1.3. Why C++?


There are also several domain specific languages (DSL) that are strong in their respective
domains, e.g.:

MATLAB (1984), R (1993)

Each language has its advantages and disadvantages, so which should be used for a course like
this one?



Ideally, such a programming language for scientific computing . . .

   • is general-purpose, so no DSL

   • produces highly efficient and portable programs

   • provides a large ecosystem of numerical / scientific libraries

   • has proven its suitability over the years

   • has a large following, which provides support and makes it unlikely that the language
     will vanish

   • can serve as a starting point to learn other, related languages

Fortran has a small community nowadays, is used for very specific applications, and its writing
style has little overlap with the other aforementioned languages.



C is basically a subset of C++, apart from some technicalities.



Python would be a really good choice, but it is easier to move from C++ to Python than
vice versa, and Python tends to hide some aspects of scientific programming that should be
taught.



Julia is a relatively new language that is not yet in wide-spread use.




. . . which leaves us with C++ as a solid choice.



                                                                                             5
1. Introduction


Versions of C++

The C++ language has evolved significantly over the last few years:


    • “Classic” C++ code is officially known as C++98/03.
    • Current versions of C++, i.e., C++11 with its relatively minor updates C++14 and
      C++17, have quite a different feel.
    • In general, modern C++ constructs should be preferred wherever possible.
    • But: these are often not covered in introductory courses.


We therefore start the lecture with a quick review of C++98/03, which is then used as starting
point to discuss modern C++, how it evolved, and how it can be used to produce more readable
and more maintainable code.


Evolution of C++
Classic C Style

// C-style fixed-length array
int fix[10] = {0,1,2,3,4,5,6,7,8,9};

// "fix" doesn't know its own size
for (int i = 0; i < 10; i++)
  if (fix[i] % 2 == 0)
    std::cout << fix[i] << " ";
std::cout << std::endl;

// C-style   "variable-length" array
int* var =   new int[n];
for (int i   = 0; i < n; i++)
  var[i] =   i;

// "var" isn't a real variable-length array:
// adding elems requires copying (or tricks)

// "var" doesn't know its own size
for (int i = 0; i < n; i++)
  if (var[i] % 2 == 0)
    std::cout << var[i] << " ";
std::cout << std::endl;

// oops, forgot to delete array: memory leak!



    • C-style arrays are just references to contiguous blocks of memory (basically pointers to
      first entry)
    • They don’t follow value semantics: copies refer to same memory blocks
    • Their length is not stored and has to be specified explicitly, inviting subtle errors



6
                                                                            1.3. Why C++?


   • Runtime fixed-length arrays aren’t true variable-length arrays
   • May lead to nasty memory leaks if they aren’t explicitly deallocated


Evolution of C++
C++98/03

// C++ variable-length array
// from header <vector>
std::vector<int> var(n);
for (int i = 0; i < n; i++)
  var[i] = i;

// std::vector is a real variable-length array
var.push_back(n+1);

// no need to remember size of "var"
for (int i = 0; i < var.size(); i++)
  if (var[i] % 2 == 0)
    std::cout << var[i] << " ";
std::cout << std::endl;

// very general (also works for maps, sets,
// lists, ...), but reeeally ugly
for (std::vector<int>::const_iterator it
    = var.begin(); it != var.end(); ++it)
  if (*it % 2 == 0)
    std::cout << *it << " ";
std::cout << std::endl;



   • C++ introduced std::vector , a true variable-length array, i.e., elements can be added
     and removed
   • Vectors have value semantics: copies are deep copies
   • A vector always knows its current size, no need to keep track
   • Same performance as C-style arrays (drop-in replacement)
   • Can be used in generic code via iterators (but leads to very verbose code)


Evolution of C++
C++11 / C++20

// C++ variable-length array
std::vector<int> var(n);
// C++11: fill using algo from <numeric> header
std::iota(var.begin(),var.end(),0);

// C++11: range-based for loop
// hides ugly iterators
for (const auto& e : var)
  if (e % 2 == 0)
    std::cout << e << " ";




                                                                                         7
1. Introduction


std::cout << std::endl;

// C++11: lambda expression (ad-hoc function)
auto even = [](int i){return i % 2 == 0;};

// C++20: filters and transforms
for (const auto& e : var
    | std::views::filter(even))
    std::cout << e << " ";
std::cout << std::endl;




    • C++11 introduced range-based for loops, making iterator-based code much more read-
      able

    • C++20 will introduce filters and transforms that can operate on such loops, here in the
      example based on a C++11 lambda expression (ad-hoc function definition)



Evolution of C++

C++11

// C-style fixed-length array
int fix[10] = {0,1,2,3,4,5,6,7,8,9};

// C++11: range-based for works with
// C-style arrays, but only for those
// with compile-time fixed length!
for (const auto& e : fix)
  if (e % 2 == 0)
    std::cout << e << " ";
std::cout << std::endl;

// C++11: modern array type from header <array>
std::array<int,10> fix2 = {0,1,2,3,4,5,6,7,8,9};

// no need to remember size of "fix2"
for (int i = 0; i < fix2.size(); i++)
  if (fix2[i] % 2 == 0)
    std::cout << fix2[i] << " ";
std::cout << std::endl;




    • C++11 range-based for loops can be used with legacy arrays, since the compiler knows
      their size implicitly

    • However, this doesn’t work when the length is runtime-dependent!

    • C++11 also introduced std::array as a drop-in replacement for fixed-length C-style
      arrays, with known size and value semantics like std::vector



8
                                                                           1.3. Why C++?


Versions of C++

Which version of C++ should I learn / use?


C++98/03 remains relevant due to (a) vast collections of legacy codebases resp. (b) program-
   mers that still use old constructs and are set in their ways.
C++11 is the current baseline and should be supported on virtually all platforms and compute
   clusters by now.
C++14 is a minor update of C++11 and is also a safe choice. Most software projects should
   accept C++14 code by now.
C++17 is a second minor update of C++11. This is perfectly fine for your own code, but
   keep in mind that large projects may be restricted to older standards to support some
   architectures.
C++20 is the upcoming new version and will bring major changes. This will become relevant
   in the near future.




                                                                                          9
2. Fundamental Concepts of C++


2. Fundamental Concepts of C++

The modern components of C++ are often built upon older constructs of the language, may
serve as superior replacements for some of them, or both.


These fundamental concepts are:

     • variables and types

     • pointers and references

     • control structures

     • functions and templates

     • classes and inheritance

     • namespaces and structure


They are taught in practically any introductory course that is based on C++. We will quickly
review them to make sure that everyone has the prerequisites for the following lectures.


2.1. Variables and Data Types

Variables, Temporaries, Literals


C++, like any other programming language, concerns itself with the computation and manip-
ulation of data.


This data represents many different things, from simple numbers and strings, to images and
multimedia files, to abstract numerical simulations and their solutions.


Put simply, C++ knows three different categories of data:

Variables are names for locations where data is stored, e.g., int i = 5; referring to an
     integer value in memory.

Temporaries represent values that aren’t necessarily stored in memory, e.g., intermediate val-
    ues in compound expressions and function return values.

Literals are values that are explicitly mentioned in the source code, e.g., the number 5 above,
      or the string "foo" .



10
                                                               2.1. Variables and Data Types


Data Types

C++ is a strongly-typed language, which means that each such representation of data, i.e.,
variable, temporary or literal, must have an associated data type.

This data type specifies how the underlying binary sequence encodes the data (semantics), and
more importantly ensures that it isn’t possible to accidentally misinterpret data or use it in
the wrong context (type safety).

C++ has a number of built-in data types and allows the introduction of user-defined types
based on certain rules. Each type is associated with a range of valid values of that type.


Fundamental Types

C++ provides a set of fundamental, or built-in, types, mostly inherited from C.


void :
A type that has no valid values. Represents “nothing” (e.g., as return type), sometimes
“anything” (when using pointers).

nullptr_t :

A type with one value, nullptr , indicating an invalid pointer. Introduced to make pointer
handling safer.

bool :
Two values, true and false , for standard Boolean algebra (truth values).

char et al.:
ASCII characters, also similar types for unicode support ( wchar_t , char16_t , char32_t ).


int et al.:
Integer numbers, with different ranges ( short , int , long , long long ) and signedness
( signed and unsigned ). signed is default and may be omitted. Also a standard type for
container sizes ( std::size_t ), one for pointer differences ( std::ptrdiff_t ), and a very
long list of fixed width types, like std::int8_t , etc.


float et al.:



                                                                                           11
2. Fundamental Concepts of C++


Floating point numbers, with single precision ( float : 32 bit), double precision ( double : 64
bit), or extended precision ( long double : usually 80 bit). Nowadays, double is used as
default floating point type if there is not a good reason to use something else.


Each of the integer and floating point types gives certain guarantees about the representable
range. These and other properties can be queried using std::numeric_limits .


Introducing New Types

These built-in types can be combined with four different mechanisms to produce new type
definitions. We already saw the first one, C-style arrays, the others are:


enum :

A user-defined set of constants:

enum Color = {red, blue, green};


These are actually integers behind the scenes, and may accidentally be used as such: prefer
→ scoped enums.


struct :

The cartesian product of some types, i.e., the set of all possible tuples of values:

struct PairOfInts {int a; int b;};



union :

The union set of some types, i.e., a type that can hold values from all specified types:

union IntOrChar {int c; char d;};


Unions don’t store the type they currently contain, which is dangerous: consider → variants
instead.


The resulting data types may then be used in further type definitions, e.g., structs as mem-
bers of other structs, or data types that are augmented with additional components using
→ inheritance.



12
                                                                 2.2. Pointers and References


Examples from DUNE

Whenever possible, we will have a quick look at real-world code from the DUNE project1 ,
preferably from the PDELab subproject2 .

// classic enum defining boundary types for a domain
enum Type { Dirichlet=1, Neumann=-1, Outflow=-2, None=-3 };

// modern C++11 enum for DG finite element basis definition
enum class QkDGBasisPolynomial
    {lagrange, legendre, lobatto, l2orthonormal};

// In C++, structs are actually just classes with
// public attributes, we therefore refer to class
// examples in that regard.

// DUNE currently contains no unions (which is a good thing!)




2.2. Pointers and References

Pointers

Each type T , whether built-in or user-defined, has an associated type T* (pointer to T )
defined, with the meaning “address of a value of type T in memory”.


int i = 5;
int* p = &i;

int* p2 = new int;
*p2 = 4;
delete p2;


      • An ampersand & in front of a variable produces its address
      • An asterisk * in front of a pointer dereferences the pointer, providing access to the
        variable itself
      • The keyword new can be used to acquire a slot in memory that is unnamed, i.e., not
        associated with a variable
 1
     https://www.dune-project.org
 2
     https://www.dune-project.org/modules/dune-pdelab




                                                                                          13
2. Fundamental Concepts of C++


It is imperative to release such memory with delete after it is no longer needed. Doing so
too early leads to nasty bugs, forgetting to do so causes memory leaks. A possible solution are
→ smart pointers.



References


In constrast to C, C++ introduces a second indirection that serves a similar purpose: references
 T& . While pointers simply refer to some memory, and may be modified to point to other
locations, references are always an alias for an existing entity.


int a = 5;
int& b = a; // b is an alias for a
b = 4;      // this changes a as well




Note that the symbol & is used in two different contexts, first to take addresses of variables
and second to specify reference types. This is a bit unfortunate, but can no longer be changed
(both constructs have seen widespread use over the last few decades).



Rvalue References


The third kind of indirection in C++ is the rvalue reference T&& , introduced in C++11. Ordi-
nary references T& are now also known as lvalue references to distinguish the two concepts.


The (oversimplified) definitions of lvalue and rvalue are:

lvalue could be on the lefthand side of an assignment, is an actual entity, occupies memory
      and has an address

rvalue could be on the righthand side of an assignment, temporary and ephemereal, e.g.,
      intermediate values or literals


Such rvalue references are mostly restricted to two use cases: as forwarding references in
→ range-based for loops, and as a vital component of the implementation of → move seman-
tics.



14
                                                                       2.3. Control Structures


Const-Correctness

The type of variables, pointers and references may be marked as const , i.e., something that
can’t be modified. This guards against accidental modification (bugs), and makes programmers’
intent clearer. Therefore, const should be used wherever possible.

int i = 5;            //   may change later on
const int j = 4;      //   guaranteed to stay 4
const int& k = i;     //   k can't be modified
i += 2;               //   ... but this still changes k indirectly!

const int*    p1 = &i;          //   pointer to const int
int const*    p2 = &i;          //   same thing
int* const    p3 = &i;          //   constant pointer to modifiable int
int const*    const p4 = &i;    //   const pointer to const int



Read right-to-left: const modifies what’s left of it (think int const i ), but the leftmost
const may be put on the lefthand side for readability ( const int i ).


2.3. Control Structures

Selection (Conditionals)

Branches are a fundamental form of program flow control. An if statement consists of a
condition, some code that is executed if that condition is fulfilled, and optionally other code
that is executed if it isn’t fulfilled:

if (i % 2 == 0)
  std::cout << "i is even!" << std::endl;
else
  std::cout << "i is odd!" << std::endl;



There is also a switch statement, but if is used significantly more often.

switch(color)    // the Color enum we introduced earlier
{
  case red:      std::cout << "red"   << std::endl; break;
  case blue:     std::cout << "blue" << std::endl; break;
  case green:    std::cout << "green" << std::endl;
}




                                                                                            15
2. Fundamental Concepts of C++


Repetition (Loops)


C++ provides two different kinds of loops, for and while loops. The former is executed a
fixed number of times3 , while the latter is repeated until some condition is met.

for (int i = 0; i < 10; i++)
  std::cout << i << std::endl;

int j = 9;
while (j > 1)
{
  std::cout << j << std::endl;
  j /= 2; // half, rounded down
}



If j were one, the loop would be skipped completely. There is also a variant do{...} while(...)
that runs at least once.


Jump Statements


Sometimes code becomes more readable if certain parts of a loop can be skipped. The
 continue statement can be used to skip the rest of the current loop iteration, while break
exits the loop prematurely:

for (int i = 0; i < 100; i++)
{
  if (i % 2 == 0)
    continue;
  if (i > 10)
    break;
  // prints 1, 3, 5, 7, 9
  std::cout << i << std::endl;
}




These two statements jump to the end of the current iteration resp. the loop, i.e., they are
constrained. C++ also has an unconstrained goto jump statement, but its use is strongly dis-
couraged. There is one accepted use case: exiting several nested loops simultaneously ( break
would leave the innermost loop only).
 3
     in practice — on a technical level, for and while are perfectly equivalent




16
                                                                                               2.4. Functions


Diagram: Loops and Branches

The two fundamental control structures:
                                                              for (A;B;C) D;
          if (A) B; [[else C;]]
                                                                        start
                        start


                                                                          A


                      cond. A
                yes   fulfilled?

                                                                      cond. B
                           no                                 no      fulfilled?



                                   yes                                        yes
                        else
            B         present?           C

                                                                                    continue
                                                                          D
                           no                                 break



                                                                          C


                        end
                                                                        end



The code while (A) B; is equivalent to for (;A;) B; .


The code do A while (B); is equivalent to A; while (B) A; .


2.4. Functions

Subprograms (Functions)

A function is a subprogram that may be reused in different parts of the program, which reduces
verbosity and helps in structuring the code. Functions have a name, zero or more arguments
with fixed type, and a return type:

// expects one double argument, returns double
double square(double x)
{
  return x * x;
}

// expects an int and a double as args, returns nothing
void printSquares(int a, double b)
{



                                                                                                          17
2. Fundamental Concepts of C++


      std::cout << square(a) << " and " << square(b) << std::endl;
}




The special type void indicates a function that doesn’t return anything. Such functions
typically have side effects (I/O, or modifications to their arguments).



Call-by-Reference


C++ is one of the languages that always create copies of arguments when a function is called
(call-by-value). This means that local changes to these variables don’t modify the originals. In
some other languages, the local names refer to the actual memory locations that were passed
to the function (call-by-reference).


To emulate this behavior in C++, one passes a pointer or reference instead4 :

// modifies its argument
void square(int* i)
{
  *i = *i * *i;
}

// prefer references: code is more readable
void square(int& i)
{
  i = i * i;
}




Call-by-reference is also often used when a function should return more than one value: one
emulates this by modifying one or more reference arguments. C++17 and later standards
provide → guaranteed copy elision and → structured bindings as a better alternative.

For large entities (e.g., vectors, matrices) it often makes sense to pass them by reference even if
they should not be modified, since this avoids costly copy operations (both in terms of runtime
and memory use):

 4
     There’s still a copy (of the pointer/reference), but it refers to the original location.




18
                                                                               2.4. Functions


// directly access original vector,
// but guarantee that it isn't modified
int sum(const std::vector<int>& vec)
{
  int out = 0;
  for (int i = 0; i < vec.size(); i++)
    out += vec[i];

      return out;
}




Default Arguments


C++ supports default arguments for functions. Arguments with defaults may then be omitted
when calling the function5 . This simplifies the function interface when these arguments have
certain values most of the time:

void print(const std::vector<int>& vec, std::string sep = ", ",
           std::string pre = "(", std::string post = ")")
{
  std::cout << pre;
  for (int i = 0; i < vec.size() - 1; i++)
    std::cout << vec[i] << sep;
  if (!vec.empty())
    std::cout << vec.back();
  std::cout << post;
}



A call print(v) will then use a default layout, but other variants can be used if desired.


Function Overloading


C++ offers function overloading, i.e., using the same name for several different functions, as
long as each function call is uniquely determined by the arguments (including handling of
default arguments).

 5
     That’s why they always come last: to keep the call unambigious.




                                                                                             19
2. Fundamental Concepts of C++


int maximum(int a, int b)
{
  if (a > b)
    return a;
  return b;
}

// a default argument for c would have to be
// smaller than any possible integer
int maximum(int a, int b, int c)
{
  return maximum(maximum(a,b),c);
}




Operator Overloading


C++ provides a large assortment of operators, i.e., tokens that are placed inline to specify
some operation, like assignment ( a = b ), arithmetics ( a + b ), or comparison ( a < b ).



Most of these operators can also be defined for custom data types like PairOfInts . The
definition works like that of an ordinary function overload:

PairOfInts operator+(const PairOfInts& p1, const PairOfInts& p2)
{
  return PairOfInts{p1.a + p2.a, p1.b + p2.b};
}



Given this definition, the following two expressions produce the same result:

// function call syntax
Pair p3 = operator+(p1,p2);
// operator syntax
Pair p4 = p1 + p2;




20
                                                                                   2.4. Functions


Function Pointers

In addition to the usual pointers, C++ also knows pointers to functions, e.g., int (*f)(int) ,
a pointer called f for functions expecting and returning int . To simplify notation, the
asterisk * and ampersand & may be omitted when referring to function pointers and addresses
of functions.

Using function pointers, functions may be used as arguments of other functions:

int square(int i) {return i * i;}

int applyTwice(int f(int), int i) // optional "*" omitted
{
  return f(f(i)); // no "*" allowed in call syntax
}

// computes pow(7,4), optional "&" omitted when taking address
std::cout << applyTwice(square,7) << std::endl;




Example from DUNE

I have found only one instance of function pointers in DUNE (but it’s hard to search for. . . ).

This code creates custom MPI parallel operation handles for given data types and binary
functions (specified as Type and BinaryFunction template parameters). The address of
 operation is used to pass the function the handle should represent.

A C-style cast is used to remove argument data types.
static MPI_Op get ()
{
  if (!op)
  {
    op = std::shared_ptr<MPI_Op>(new MPI_Op);
    MPI_Op_create((void (*)(void*, void*, int*,
        MPI_Datatype*))&operation,true,op.get());
  }
  return *op;
}

static void operation (Type *in, Type *inout,
    int *len, MPI_Datatype*)
{
  BinaryFunction func;

  for (int i=0; i< *len; ++i, ++in, ++inout) {




                                                                                               21
2. Fundamental Concepts of C++


         Type temp;
         temp = func(*in, *inout);
         *inout = temp;
     }
}




The Main Function


The execution of a C++ program starts in the first line of a special function called main ,
and ends when its last line is reached. Every C++ program has to define exactly one such
function.


The signature of the main function has to be one of

         • int main() (this means command line arguments are ignored)

         • int main(int argc, char** argv)

         • int main(int argc, char* argv[])

The second and third variant have the same meaning: argc is the number of arguments, and
argv an array of C-style strings.


The return value of main is an error code — not returning anything is equivalent to writing
return 0; (implying success).


2.5. Templates

Often, one has to define the same functionality for several different data types. This can become
tedious, both during initial implementation and when fixing bugs.


C++ provides a language feature for this, where all the different versions are auto-generated
from a special construct, called a function template:

template<typename T>
  T square(T x)
  {
    return x * x;
  }



22
                                                                                 2.5. Templates


A function square<T> is then available for any type T that has a multiplication operator
*:

int i    = square<int>(5);      // int version
float f = square<float>(27.f) // float version
double d = square<double>(3.14) // double version




Function definitions aren’t the only use case for templates, one can also automate the generation
of data types. These are known as class templates, since structs are a special case of classes in
C++.

template<typename T>
  struct Pair {T a; T b;};

Pair<int> ip;   // a pair of ints
Pair<float> fp; // a pair of floats

// Pair<int> is a data type, and can be used as such
Pair<Pair<int>> ipp; // pair of pair of ints




Function templates and class templates were the only types of templates until C++14, when
→ variable templates were introduced.


Non-Type Template Parameters

Typically, one uses template parameters to introduce abstract types T , but there are also
non-type template parameters. These can be used as compile-time constants:

// simple fixed-length vector
template<typename T, int n>
  struct Vector
  {
    enum {dim = n}; // knows its size
    T vals[n];
  }



An advanced use case for such non-type template parameters is → template meta programming,
where function values and number sequences can be precomputed at compile time.



                                                                                              23
2. Fundamental Concepts of C++


Template Template Parameters

In addition to types and values, one may also use templates themselves as arguments for other
templates, so-called template template parameters. These can, e.g., be used to allow choosing
between different implementations:

// accepts any template that expects a type and an int
template<template<typename,int> class V>
  double norm(const V<double,3>& vec);



One such level of templates-in-templates can be a very powerful tool, but you shouldn’t overdo
it:

template<template<typename> class U, typename T>
  struct Wrap {U<T> u;};

// just don't...
template<template<template<typename> class,typename> class W>
  struct Meta {W<Pair,double> w;};




Default Template Parameters

All three types of template parameters, i.e., types, non-type template parameters, and template
template parameters, can have defaults, just like function arguments:
// use our vector struct by default
template<template<typename,int> class V = Vector>
  double norm(const V<double,3>& vec);

// generalization of our Pair struct
template<typename U, typename V = U>
  struct Pair {U a; V b;};

// provides special case for square matrices
template<int n, int m = n>
  struct Matrix
  {
    // "misuse" enum for defs at compile time
    enum {dim1 = n};
    enum {dim2 = m};

       // ...
     };



Parameters with defaults may be omitted, i.e., one can write Pair<int> (just as before!), or
Matrix<5> for square matrices.



24
                                                                                           2.5. Templates


Renaming Types

C and C++ provide the keyword typedef , which may be used to introduce new names for
types. It is actually a slight misnomer, since there is no real new definition6 : they are akin to
C++ references, but on the conceptual level of data types.

typedef unsigned long long int ull;
ull a = 12345678901234567890u; // huge unsigned int


While such new names can be introduced for any type, it is especially helpful for the types
from template instantiations:

typedef Pair<Vector<double,3>,Vector<double,2>> VecPair3D2D;


In C++11 and above, consider using → type aliases instead, as they are more general and
more readable.


Example from DUNE

A template with template template parameter and default parameters introducing
      • an inner template MatrixHelper

      • two typedefs: size_type and type

      • an → alias template named Pattern


template<template<typename> class Container = Simple::default_vector,
         typename IndexType = std::size_t>
  struct SparseMatrixBackend
  {
    typedef IndexType size_type;

       //! The type of the pattern object passed to the GridOperator for pattern construction.
       template<typename Matrix, typename GFSV, typename GFSU>
       using Pattern = Simple::SparseMatrixPattern;

       template<typename VV, typename VU, typename E>
       struct MatrixHelper
       {
         typedef Simple::SparseMatrixContainer<typename VV::GridFunctionSpace,
                 typename VU::GridFunctionSpace,Container, E, size_type> type;
       };
     };




 6
     Note that typedef struct{int a; int b;} PairOfInts; is a valid definition, albeit a rather convoluted
      one.




                                                                                                       25
2. Fundamental Concepts of C++


Function Template Parameter Deduction

The specification of template parameters is often redundant when using function templates,
because type template parameters are readily deduced from the function arguments. If this is
the case, they can be omitted, and the call looks like a normal (non-template) function call:


template<typename T>
  T square(T x) {return x * x;}

int i    = square(5);   // short for square<int>
double d = square(27.); // short for square<double>


Note that sometimes this isn’t possible:

// can't deduce return type from call
template<typename T>
  T generate();




2.6. Classes

Classes / Methods

The original name of C++ was “C with classes”, so classes, objects and object-oriented pro-
gramming are clearly an important part of C++.

While a classic C struct is simply an aggregation of data, C++ structs and classes typically
contain methods, functions that are closely linked to the data and part of the type definition:


struct Vector2D
{
  std::array<double,2> comp;

     // const: method may be called for const vectors
     double norm() const
     {
       return std::sqrt(comp[0]*comp[0] + comp[1]*comp[1]);
     }
}

Vector2D v{{3,4}};
std::cout << v.norm() << std::endl; // prints 5




26
                                                                                     2.6. Classes


Access Specifiers


C++ provides three access specifiers:

private : accessible by the object itself and other objects of the same class

protected : like private , but additionally accessible in → derived classes.

public : always accessible


Sometimes it is helpful to exempt certain classes or functions from these restrictions using a
 friend declaration. This should be used sparingly, since it breaks encapsulation and exposes
implementation details.

struct BankAccount
{
  // full access -- maybe not the best of ideas?
  friend class Customer;

     private:
       double balance; // hide this!
};




Encapsulation


The only difference between C++ structs and classes is default visibility: in a struct, everything
is public by default (but may be declared private ), and vice versa for classes.


The hiding of implementation details using private is known as encapsulation and is generally
a good idea, mainly for the following reasons:

     • ensures consistent state: in a plain struct, data is either const or open to arbitrary,
       potentially nonsensical, changes

     • facilitates modularity: private components are unknown outside of the class itself, and
       may therefore be exchanged and modified at will

     • improves communication of intent: anything marked public is part of the intended
       interface



                                                                                               27
2. Fundamental Concepts of C++


Constructors

Data that was declared private is inaccessible outside of the class, so we need special public
methods to initialize such data. These methods are called constructors. In C++, they have
the same name as the class they belong to and are the only functions without return type.

class Vector2D
{
  std::array<double,2> comp;

     public:

     Vector2D() : comp{0,0} {};

     Vector2D(double a, double b)
       : comp{a,b}
     {};

  // ...
};




This (incomplete) class provides two constructors, one for arbitrary points, and one for the
origin.


Why not use default arguments instead? Because then one could omit b while supplying a
(design decision).

There are three types of constructors with special names:

      • The default constructor T() without arguments: called, e.g., when mass producing
        vector entries during initialization

      • The copy constructor T(const T&) : has the task of creating a copy of the object spec-
        ified in the function call

      • The move constructor T(T&&) : like the copy constructor, but may cannibalize the orig-
        inal object (which should be left in some valid default state)


Converting Constructors

Any constructor that is not marked as explicit is a so-called converting constructor, and is
called for implicit type conversions7 . Assume we have defined

Matrix(double); // diagonal matrix, or constant matrix?
operator*(double, const Matrix&);        // scaling
operator*(const Matrix&, const Matrix&); // matrix multiplication
 7
     type promotions, like from int to double when needed




28
                                                                                   2.6. Classes


but not operator*(const Matrix&, double) . Then a call a * 2. , with a a Matrix , will
call Matrix(2.) followed by matrix multiplication. This may lead to unexpected results.

If such conversions aren’t intended, the constructor has to be marked:

explicit Matrix(double); // no implicit type conversions




Conversion Operators

Closely linked to constructors are conversion operators. While converting constructors are
used to convert to the class type, these operators are used to convert from the class type to
some other specified type, mainly when
   • the target type is a fundamental type
   • the target type is some class provided by an external library
In both cases it is impossible to simply provide the right constructor for the target type.

Here, conversion operators can be employed instead:

struct Vector
{
  operator Matrix(); // implicit promotion to column matrix
  explicit operator double(); // only explicit conversion
  ...
}




Delegating Constructors

Function overloading can be used to forward function calls to other versions of the same
function, e.g., to swap function arguments, or as in the maximum function we introduced, or
as a form of mutual recursion.

Similarly, one may define delegating constructors which call other constructors:

// constructor for square matrices uses general constructor
Matrix(int n)
   : Matrix(n,n) // no further entries allowed
{}



                                                                                              29
2. Fundamental Concepts of C++


Rules:
      • The delegating constructor may not initialize anything itself (only call this second con-
        structor)
      • The calls cannot be recursive (at some point, initialization has to take place)
      • The function body is executed after the other constructor has finished (thereby allowing
        local modifications)


Destructors

Destructors are the counterpart to constructors: they clean up data when an object goes out of
scope and its lifetime ends. Most of the time explicitly declaring a destructor isn’t necessary,
but it is vitally important if, e.g., memory was allocated by the object.

The name of the constructor is the class name with a tilde ~ as prefix.

The correct use of destructors leads directly to the technique of → Resource Acquisition is
Initialization (RAII).
class IntStorage
{
  int n;
  int* data;

     public:

     IntStorage(int n_)
       : n(n_), data(new int[n])
     {};

     // copies are neither forbidden
     // nor handled correctly
     // --> segmentation fault waiting to happen

  ~IntStorage()
  {
    delete data;
  }
};




Default Methods

For any type T , the compiler automatically generates several methods for us if applicable:
      • default constructor T()

      • default destructor ~T()

      • copy constructor T(const T&)



30
                                                                                     2.6. Classes


      • copy assignment T& operator=(const T&)

      • move constructor T(T&&)

      • move assignment T& operator=(T&&)


In each case, the method is not created automatically if that is impossible, e.g., if the class is
storing some reference, or if there are user-defined versions.

In the case of the default constructor T() , the presence of any user-defined constructors
prevents its creation.


The move constructor and move assignment operator aren’t created if any of the other men-
tioned methods except the default constructor has been user-defined.


The assignment = default as in

T() = default


can be used to explicitly state that not providing some method is actually intended and not a
mistake, or force the generation of the default constructor in the presence of other construc-
tors.



Rules of Zero and Five


The rule of zero states that it is often a good idea to implement at most some custom con-
structors and none of the aforementioned methods. This is concise and perfectly appropriate
when the class is just a collection of data with some methods.


However, sometimes it is necessary to provide replacements for these methods, e.g., because
the default copy methods perform flat copies of pointer structures.


The rule of five8 states that if a user-defined copy constructor, copy assignment operator,
move constructor, move assignment operator, or destructor is present, then it is very likely
that all five should be explicitly defined: if one of these methods has to be specialized, then
the underlying reason is typically relevant for all of them.
 8
     formerly known as “rule of three”, before move semantics were introduced




                                                                                               31
2. Fundamental Concepts of C++


Deleted Methods

Returning to the IntStorage object, we have two different options:

      • provide user-defined copy/move constructors and assignment operators to enable deep
        copy semantics

      • simply prohibit the creation of copies of such objects


Let’s go with the second option for the sake of argument. Before C++11, one would have
implemented the methods to prevent their automatic generation, and then declared them
 private . Any attempt to copy such an object would then trigger a compilation error, but
the construct is a rather poor choice in terms of communicating intent.


Since C++11, one can declare methods as deleted, preventing their automatic creation:

T(const T&) = delete;
T& operator=(const T&) = delete;




Mutable Members

In C++, declaring an object const doesn’t mean that its representation (byte sequence) has
to be immutable, just that any such changes are invisible from the outside. The keyword
 mutable marks members as modifiable in const methods:

class Matrix
{
    mutable bool detValid = false;
    mutable double det;

     public:

      double determinant() const
      {
        if (!detValid)
        {
          det = calcDet(); // expensive (private) helper function
          detValid = true;
        }
        return det;
      }
}




Note that any method that modifies the matrix has to set detValid = false !



32
                                                                                  2.6. Classes


Static Members

The keyword static indicates that something belongs to the abstract definition, not the
concrete instance:
   • In functions definitions, static variables belong to the function itself, not the current
     function call (and therefore persist across function calls)
   • In class definitions, static variables and methods belong to the class itself, not the
     created objects (i.e., they are shared between all objects of this class)

In functions, static variables can serve as function “memory”, e.g., in function generators.
In classes, static members can be used to, e.g., count the number of instantiated objects,
manage a common pool of resources (memory, threads, . . . ), or provide small private helper
functions.


The Singleton Pattern

A design pattern using static is the singleton pattern, which creates a class that provides
exactly one instance of itself and prevents the creation of further such objects.

This pattern tends to be overused. It should be restricted to situations where
   • the notion of two such objects doesn’t make sense under any circumstance
   • the single instance needs to be accessible from everywhere in the program

Standard applications concern the centralized management of system resources:
   • a centralized logging facility, printing queues, or network stacks
   • thread and memory pools, or the graphical user interface (GUI)
Realization of the singleton pattern in C++:
class Singleton
{
  public:
    static Singleton& getInstance()
    {
      // kept alive across function calls
      static Singleton instance;
      return instance;
    }

    // prevent creation of copies
    Singleton(const Singleton&) = delete;
    void operator=(const Singleton&) = delete;

  private:
    // only callable within getInstance()




                                                                                           33
2. Fundamental Concepts of C++


      Singleton(){...}; // hide constructor
};

// in user code:
Singleton& singleton = Singleton::getInstance();




Example from DUNE

An application of the singleton pattern in DUNE:
class MPIHelper
{
  public:
  ...

      DUNE_EXPORT static MPIHelper&
          instance(int& argc, char**& argv)
      {
        // create singleton instance
        static MPIHelper singleton (argc, argv);
        return singleton;
      }

     ...
     private:
     ...

      MPIHelper(int& argc, char**& argv)
      : initializedHere_(false)
      {...}
      MPIHelper(const MPIHelper&);
      MPIHelper& operator=(const MPIHelper);
};



      • Conceptually, there can be only one instance of the Message Passing Interface (MPI).
      • Its initialization and finalization functions must be called exactly once, the singleton
        pattern guarantees this.
      • Copy constructor and assignment operator have been hidden instead of deleted (pre-
        C++11 style).


2.7. Inheritance

Quite often, there is a natural relationship between several classes based on their purpose:
      • Several matrix classes (dense vs. sparse, small vs. millions of entries, local vs. parallely
        distributed, hierarchically blocked or not,. . . )
      • Different algorithms for matrix inversion / decomposition



34
                                                                                    2.7. Inheritance


     • A range of solvers for nonlinear problems, spatial discretizations, time stepping schemes,. . .

This relationship can be expressed using inheritance:
     • Extend and augment existing classes
     • Collect and maintain common code in base classes
     • Express and enforce interfaces through → abstract base classes (ABCs)
Inheritance establishes a relationship between classes, one derived class and an arbitrary num-
ber of base classes (typically just one).

struct A1 {int a};
struct A2 {int a};

struct B : A1, A2 // B inherits from A1 and A2
{
  int b;

     B(int c1, int c2, int c3) : b(c1), A1::a(c2), A2::a(c3)
};

The class B is an extension of both A1 and A2 , and contains three ints (two a s, and one
b ), since it inherits all data members and methods from A and B .

Inside of B the two different a s have to be accessed via A1::a and A2::a , because the
simple name a would be ambiguous.


Class Hierarchies

Derived classes may themselves have derived classes, leading to a hierarchy of data types:

struct A {int i};
struct B : A {int j}; // B IS-A A
struct C : B {int k}; // C IS-A B (and therefore an A)

Again, i can be accessed in C , possibly under the name B::A::i if i alone would be
ambiguous.

Conceptually, this hierarchy always forms a tree:

struct D : B, C {int l}; // contains two independent A and B each

 B::j and C::B::j are two independent variables! We will see more about this when dis-
cussing the → Deadly Diamond of Death and → virtual inheritance.



                                                                                                  35
2. Fundamental Concepts of C++


Access Specifiers in Inheritance



Access specifiers can also be used when specifying inheritance relationships, as in

class A : public B {...};


If this access specifier is omitted, it defaults to public for structs and private for classes9 .



Access rights combinations for inherited methods and data members:

 . . . is inherited. . .    public ly        protected ly        private ly
     public                public           protected           private
     protected             protected        protected           private
     private               —                —                   —



Public Inheritance



Public inheritance models the subtype relationship from entity-relationship models: a derived
class object IS-A base class object, in the sense that it fulfills the same interface.



Possible examples of this are:

      • A Circle IS-A Shape

      • A DiagonalMatrix IS-A Matrix

      • A NewtonSolver IS-A NonlinearSolver



An object of the derived class is expected to be a perfect replacement for objects of the base
class. This puts additional responsabilities on the person implementing the derived class, e.g.,
code operating on pointers and references of the base class should continue to work for those
pointing to objects of the derived class.
 9
     This is the reason why “ : public ” is typically used instead.




36
                                                                                2.7. Inheritance


The Liskov Substitution Principle

In principle, the subtype (derived class) should exhibit the same behavior as the supertype
(base class). However, this is hard to verify in general. The Liskov Substitution Principle
(LSP) defines some constraints that are meant to aid in this task.


Methods that share a name with one of the methods of the base class (and thereby override
them) should not lead to surprising behavior:
Contravariance of arguments The method may also accept supertypes of the original argu-
     ments10 .
Covariance of return types The method may return a subtype of the original return type11 .
Exception safety The method may only throw the original → exceptions or subtypes thereof.
Additionally, the following things should hold:
Preconditions The subtype may not strengthen preconditions (put additional constraints on
     the environment where its methods are called).
Postconditions The subtype may not weaken postconditions (leave conditions unfulfilled that
     are always fulfilled by the supertype).
Invariants The subtype must honor invariants of the supertype (things that are generally true
      for its objects)
History constraint The subtype may not allow state changes that are impossible from the
     viewpoint of the supertype.


In short, the base class part of the derived class should perform according to expectations.
 private members becoming inaccessible in the derived class helps in this regard.


Is a Square a Rectangle?

According to the LSP, it depends on the concrete implementation whether a Square is indeed
a Rectangle:
      • If squares and rectangles are immutable (unchangable after their creation), or only pro-
        vide a methods for scale adjustments and rotation/translation, then a Square can be a
        Rectangle.
      • If changing the length of a Square would also change its width, then assumptions about
        the supertype Rectangle would be violated.
      • Keeping the width constant instead means it fails at being a Square.
10
     not available in C++
11
     in C++ only in the context of → dynamic polymorphism




                                                                                             37
2. Fundamental Concepts of C++


     • Therefore, Squares cannot be Rectangles if the length and width of rectangles can be
       controlled independently.



                              Again: Is a DiagonalMatrix a Matrix?


Protected/Private Inheritance

Private inheritance implements one class in terms of another:
     • the members of the base class become private members of the derived class
     • this is invisible from outside of the class
     • the derived class may access the public interface of the base class
Mostly equivalent to just storing the base class as a private member instead, except for so-called
empty base class optimization.


Protected inheritance works the same, but the inheritance is visible to children of the derived
class as well. Seldom used and few applications.


Examples from DUNE

DUNE currently contains
     • of course a large number of instances of public inheritance,
     • exactly one instance of protected inheritance,
     • and a small handful of cases with private inheritance.
class MacroGrid
  : protected DuneGridFormatParser
{...}

template <int block_size, class Allocator=std::allocator<bool> >
class BitSetVector : private std::vector<bool, Allocator>
{...}

template<int k>
struct numeric_limits<Dune::bigunsignedint<k> >
  : private Dune::Impl::numeric_limits_helper
      <Dune::bigunsignedint<k> > // for access to internal state of bigunsignedint
{...}


     • MacroGrid and descendants may access internal DuneGridFormatParser
     • BitsetVector is implemented using std::vector , but one may not be used as re-
       placement for the other



38
                                                                            2.8. Code Structure


2.8. Code Structure

We have now reviewed the fundamental building blocks of C++, and finish this section with
a short look at code structure. C++ supports a range of tools that can be used to structure
code bases and make large collections of code more maintainable:
   • header files and the accompanying header guards to structure the concrete file layout and
     maintain code dependencies
   • source files to provide implementations separate from declarations, thereby guaranteeing
     stable interfaces while allowing modification of implementation details
   • namespaces to structure code on the conceptual level and prevent name clashes between
     different libraries


C++20 will introduce → modules as known from other programming languages, which will
significantly improve importing other C++ libraries and exporting one’s own constructs as a
library.


Header Files

By convention, C++ code is separated into two different types of files:
   • header files (.hh), containing declarations and interfaces
   • source files (.cc), containing definitions and implementations
Both types of files are needed to build a given code project, but only the header files are
necessary when writting code that should link against some external library.


Templates typically go against this convention, with their complete definition put into header
files: the definition is needed during template instantiation, and without it dependent code
could only use a given, fixed set of predefined and preinstantiated variants.


Header Guards

The #include preprocessor statement simply includes raw text from header files, recursively
if necessary:
   • Typically, header files are included several times within a program (e.g., <iostream> ,
     <vector> , etc.).
   • This would lead to redefinitions, and therefore compilation errors.
   • Even without such errors, reparsing of these files would lead to long parse times, especially
     when considering header files including other header files.



                                                                                               39
2. Fundamental Concepts of C++


Therefore, use header guards:

#ifndef HEADER_HH // only read file if not yet defined...
#define HEADER_HH // ...and define to prevent second read

... // actual header content (only parsed once)

#endif // HEADER_HH // reminder what is closed here




Namespaces

The one definition rule (ODR) of C++ demands that names are unambiguous:

     • local definitions take precedence over those from enclosing scopes

     • providing two differing definitions for the same name with the same visibility is forbidden


This leads to problems when

     • a certain name should be reused in different parts of a very large program

     • by coincidence, two (or more) external libraries define the same name


Solution: encapsulate libraries, sublibraries, and independent project parts using names-
paces.

A simple namespace example:

namespace Outer
{
  namespace Inner
  {
    struct Data{};
  }
}

In this example, the struct Data is known as

     • Data , Inner::Data , or Outer::Inner::Data in the innermost scope

     • Inner::Data or Outer::Inner::Data in the outer scope

     • only Outer::Inner::Data in the rest of the program



40
                                                                       2.8. Code Structure


Example from DUNE

In DUNE, a namespace called (fittingly) Dune encapsulates the whole project. This namespace
is used for the core modules.

Downstream modules like PDELab typically introduce subnamespaces, e.g., Dune::PDELab ,
for their own classes and functions. This way, these modules may use names that would
otherwise collide with each other or parts of the core modules.
#ifndef DUNE_PDELAB_NEWTON_NEWTON_HH
#define DUNE_PDELAB_NEWTON_NEWTON_HH
...
namespace Dune
{
  namespace PDELab
  {
    // Status information of Newton's method
    template<class RFType>
    struct NewtonResult : LinearSolverResult<RFType>
    {
      ...
    }
    ...
  } // end namespace PDELab
} // end namespace Dune
#endif // DUNE_PDELAB_NEWTON_NEWTON_HH




                                                                                        41
3. The Standard Library


3. The Standard Library

The Standard Library is a set of classes and functions that is part of the C++ language
standard. It provides most of the common “tools of the trade”: data structures and associated
algorithms, I/O and file access, exception handling, etc. The components are easily recognized
because they are in the std namespace.


Large parts of the Standard Library were already available in C++98/03 and are based on the
Standard Template Library (STL), which is the library where common container classes like
std::vector originate from. For this reason, it is still sometimes called “the STL”.


Other parts were introduced in C++11 and later standards, often originating in the Boost
C++ Libraries, a well-known set of open-source libraries that provide advanced utility classes
and templates.
We are going to discuss the following older parts of the Standard Library in detail:
     • input and output streams
     • containers (a.k.a. data structures) and iterators
     • algorithms and functors
     • companion classes (pair and tuple)
     • exceptions


C++11 additions like → smart pointers, → random number generation, → threads, and → reg-
ular expressions will be discussed at a later point, and the same holds for the → filesystems
library from C++17.


Note that one usually has to include a certain header for a library feature to be available, e.g.,
#include <iostream> for I/O, or #include <vector> for std::vector .


3.1. Input / Output

C++ provides stream-based I/O: each stream is an abstraction for some source and/or desti-
nation of data, and each such stream is used in the same way, whether it represents standard
C I/O, the content of a file, or simply the content of some string.


The relevant types are:
[i,o,io]stream generic input, output, or bidirectional I/O stream

[i,o,io]fstream specialization for reading / writing files



42
                                                                            3.1. Input / Output


[i,o,io]sstream sprecialization for strings as data sources/sinks
There are four predefined global variables for standard C I/O:
cin standard C input stream
cout standard C output stream
cerr standard C error stream, unbuffered
clog standard C error stream, buffered

For larger programs it is good practice not to write to these streams directly, since extraneous
output can make it difficult to use the software in new contexts and other projects. Instead, one
writes into intermediate stream objects, which may then redirect the data at their discretion.
The standard way of using streams are the well-known << and >> operators, maybe with
some I/O modifiers like floating-point formatting. Additionally, there are also special types
of → iterators available that make a more or less seamless interaction with container classes
possible.

If it should be possible to read and/or write an object from/to a stream (serialization12 ), one
has to specialize the stream operators.
      • Use free functions, because the stream is the left-hand side operand
      • friend declaration, because we need access to the object internals
struct Streamable
{
  // ...

  // return streams to facilitate operator chaining
  friend std::istream& operator>>(std::istream& is, const Streamable& s);
  friend std::ostream& operator<<(std::ostream& os, const Streamable& s);
};




Example from DUNE

//! Print a std::array
template<typename Stream, typename T,
    std::size_t N>
inline Stream& operator<<(Stream& stream,
    const std::array<T,N>& a)
{
  stream<<"[";
12
     https://isocpp.org/wiki/faq/serialization




                                                                                              43
3. The Standard Library


     if(N>0)
     {
       for(std::size_t i=0; i<N-1; ++i)
         stream<<a[i]<<",";
       stream<<a[N-1];
     }
     stream<<"]";
     return stream;
}


This code defines an output format for std::array as long as the stored type T itself can
be printed.

The stream is returned to the caller, which enables operator chaining as in

std::cout <<     a1
          <<     " "
          <<     a2
          <<     std::endl;



3.2. Container Classes

Sequential Containers

C++ provides three different types of containers (data structures): sequences, container adap-
tors, and associative containers. They are the C++ versions of common and well-known data
structures (array, linked list, tree, hash map, etc.).

The elements of container objects all have the same type, specified as a template parameter.
This restriction can be somewhat lifted through → dynamic polymorphism. Since C++17 one
may also use a → variant, or the → std::any class as element type.

The simplest type of container is a sequence, where the elements are associated with an integer
index (either explicitly or implicitly).
C++ provides the following sequences:
array array with fixed size and random access
vector array with variable size and random access
deque double-ended queue with random access
list doubly linked list



44
                                                                        3.2. Container Classes


forward list singly linked list

std::vector<int> a(3,5); // array [5,5,5]
std::vector<int> b{3,5}; // array [3,5]
std::list<double> c;     // empty doubly-linked list




Container Adaptors

In C++, a container adaptor implements some data structure in terms of another. Three such
adaptors are provided:
stack LIFO (last in, first out) structure
queue FIFO (first in, first out) structure
priority queue a priority queue (surprise!)

std::stack<double> a;          // stack based on deque
std::stack<int,std::vector> b; // stack based on vector
std::queue<int,std::list> c;   // queue based on list



Associative Containers

In contrast to sequential containers, which are indexed by integers, associative containers may
have an arbitrary index type. This index is associated with a value, and the container can be
searched for this key, producing the value if the key is found.

C++ knows four sorted associative containers:
set key is unique and identical to its associated value
multiset like set, but key may appear multiple times
map key is unique, value is pair of index and some mapped type
multimap like map, but key may appear multiple times

The keys of these sorted containers need to be equipped with a strict weak ordering relation.
The containers are typically implemented using red-black trees.
Additionally, C++ also provides unsorted associative containers. While the sorted ones typi-
cally use trees internally, the unsorted ones are usually based on hash tables.

Each sorted container has an unsorted counterpart:



                                                                                            45
3. The Standard Library


unordered set, unordered multiset, unordered map, and unordered multimap.

These unsorted containers are more universally applicable, since they don’t need the underlying
ordering relation. However, the elements appear in random order (based on the hash function
in practice).

Unsorted containers tend to be faster than sorted containers but may be worse if there are many
collisions in the hash function. For certain applications the guaranteed worst-case performance
of sorted containers may be an important feature.


Complexity Guarantees

An important part of the container library are the accompanying complexity guarantees. Any
implementation of C++ has to provide methods with certain upper bounds on performance.
For the most part, these follow naturally from the default underlying data structures.
      • Accessing an element of a vector is in O(1), and adding an element at the end is
        amortized13 O(1), worst-case O(N ), where N is the number of elements.
      • Accessing an element of a list is in O(N ), and adding or deleting elements is O(1)
        anywhere in the list.
      • Operations on sorted associative containers are typically in O(log N ).
      • For unsorted associative containers this is replaced with amortized O(1), worst-case
        O(N ).
A table with all the container methods is available online14 , and complexity guarantees are
linked from there.


Sequence Test: Vector vs. List

Task by J. Bentley and B. Stroustrup (Element Shuffle):

      • For a fixed number N , generate N random integers and insert them into a sorted se-
        quence.
        Example:
           – 5
           – 1, 5
           – 1, 4, 5
           – 1, 2, 4, 5
13
     averaged over many method calls
14
     https://en.cppreference.com/w/cpp/container




46
                                                                              3.2. Container Classes


           • Remove elements at random while keeping the sequence sorted.
                     Example:
                          – 1, 2, 4, 5
                          – 1, 4, 5
                          – 1, 4
                          – 4
           • For which N should a list be used, and in which cases a vector ?




                                      list
                     104              vector
                                      5.3e-11 * x^2.49
                                      4.5e-10 * x^2.05

                     102
 required time [s]




                     100


                     10   2




                                103                 104              105              106
                                                          elements

Potentially suprising results:
           • Despite random insertion / deletion, vector is faster by an order of magnitude
           • Linear search for both containers, despite bisection being available for vector (!)
           • Search completely dominates move required by vector
           • Non-optimized list performs one allocation / deallocation per element (!)



                                                                                                   47
3. The Standard Library


Use vector as default — and if not, back up assumptions with measurements


3.3. Iterators

The main way to interact with containers is via iterators, which are generalizations of the
pointer concept, i.e., objects that can be dereferenced ( * ) and advanced ( ++ ). For each
container type T , there is:
     • an associated type T::iterator , for read / write access to the container
     • an associated type T::const_iterator , providing read-only access

     • a method begin() , which returns an iterator pointing to the first element

     • a method end() , returning an iterator one position past (!) the last element

     • equivalent methods cbegin() and cend() for read-only access


The element order is defined by the index for sequences, the keys for sorted containers, and the
hash function for unsorted containers. There are also additional types / methods that reverse
this order ( rbegin() , rend() , etc.).
Using iterators, one can write functions that work for different containers:

template<typename T>
  void print(const T& cont)
  {
    for (typename T::const_iterator it
        = var.begin(); it != var.end(); ++it)
      std::cout << *it << " ";
    std::cout << std::endl;
  }


     • The keyword typename is used inside templates to specify that a dependent name (iden-
       tifier that depends on at least one template parameter) refers to a type, since the compiler
       could mistake it for a variable name, etc.
     • The prefix increment operator is usually more efficient than the postfix one, since there
       is no need for temporary iterator copies.

The properties of iterators depend on the underlying container:
array, vector, deque Bidirectional ( ++ / -- ), random access (i.e., instant jumps of arbitrary
      stride possible)
list Bidirectional, no random access (must follow pointer chain)
forward list Forward direction only, neither backward direction nor random access



48
                                                                                 3.4. Algorithms


sorted assoc. cont. See list
unsorted assoc. cont. See forward_list 15


The iterators provide their properties in the form of iterator tags (public members), which may
be used to, e.g., write more efficient → template specializations of algorithms for iterators that
provide random access.


3.4. Algorithms

The Standard Library provides a large number of algorithms16 that are tailored to these dif-
ferent iterator categories, and automatically make full use of the capabilities of the container
they are operating on.


Example algorithms:
for each apply some function to each element (lifting)
count if count elements with certain properties
find if find first element with such property
copy if insert applicable elements into other container
shuffle randomly re-order container contents
sort sort container according to some criterion


Try to use predefined algorithms instead of writing your own function templates
Many of these algorithms expect some criterion, transform, or operation, which has to be
supplied as a functor (function object): an object that has an operator() with the required
number of arguments.


The Standard Library provides some default functors, e.g., std::less :

// possible implementation of std::less
template<typename T>
  struct less
  {
    // first set of () is operator name
    bool operator()(const T& lhs, const T& rhs) const
    {
      return lhs < rhs;
    }
  }

15
     mainly because their order is arbitrary anyways
16
     full list: https://en.cppreference.com/w/cpp/algorithm




                                                                                               49
3. The Standard Library


User-implemented functors can be used to customize the provided algorithms, or one can use
→ lambda expressions or function pointers instead.


3.5. Companion Classes

The Standard Library also provides a number of class templates that are not containers, but
serve similar purposes as containers and are often used in conjunction with them.

std::pair<T,U> :

The official version of the Pair struct we defined ourselves. Contains a T first and a
U second .

std::tuple<T...> :

Generalization of std::pair to general tuples using → variadic templates. Member access
through a std::set function, which expects either the type of the component as template
parameter, or its index if the former is ambiguous.

C++17 adds → optional, → variant, and → any to the list.


3.6. Exceptions

C++ knows several different mechanisms for error handling:

assert :
Runtime check of some condition that should always be fulfilled (sanity check). Aborts the
program if condition evaluates to false .

static_assert :
Compile-time check with similar purpose, see → template meta programming. Produces com-
pilation error if condition is not met.

Exceptions:
Error handling mechanism for situations that should not be the norm, but may sporadically
occur during normal operation: memory exhausted, file not found, matrix is singular, solver
failed to converge. . .

An exception is an arbitrary object that can be interpreted as an alternative return value,
delivered using a mechanism that differs from the standard return . The Standard Li-
brary provides some predefined exceptions for convenience, like std::domain_error and
std::range_error , and new exceptions may be defined by inheriting from std::exception .



50
                                                                              3.6. Exceptions


double nthroot(double x, int n)
{
  if (n % 2 == 0 && x < 0)
    // throw statement: execution of function is stopped here...
    throw std::domain_error("even powers require non-negative argument");
  ...
}

try // try block: register for treatment of potential exceptions
{
  double y = nthroot(-5.,2);
}
catch(std::domain_error e) // catch block: ...program control jumps here
{
  // try to do something better than just printing a message in practice
  std::cout << "nthroot failed, message: " << e.what() << std::endl;
}




Points to consider:
   • Exceptions are for error conditions that can’t be handled locally
   • A return always returns to the immediate caller, but a throw unwinds the call stack
     until a matching catch block is found
   • If none is found at all, the program is aborted (should be avoided if possible)
   • All function calls between the throw statement and the catch block are stopped
     prematurely

This means that local resources have to be handled in those intermediate functions (allocated
memory, open file handles, ongoing communication) during stack unwinding. An elegant mech-
anism to do this automatically is → RAII.




                                                                                          51
4. Advanced Topics


4. Advanced Topics

After having reviewed the basic building blocks of C++, i.e., the fundamental concepts and at
least parts of the Standard Library, we discuss more advanced topics:

      • template specializations

      • interactions between templates and inheritance

      • Resource Acquisition is Initialization (RAII)

      • template metaprogramming

      • dynamic polymorphism (virtual functions)

      • static polymorphism (CRTP)

      • Substitution Failure is not an Error (SFINAE)



4.1. Template Specialization

The main idea of templates is the reduction of code duplication through generalization, but
sometimes there are special cases that should / have to be treated differently. This can be
done by providing an explicit template specialization:

// make sure that int pointers are safe
template<> // empty set of remaining parameters
  struct Pair<int*> {int* a = nullptr; int* b = nullptr;};



For class templates17 , C++ additionally allows partial template specialization, where the tem-
plate parameters are constrained but not fully specified:

// make sure pointers are safe
// note local change in meaning for U and V!
template<typename U, typename V>
  struct Pair<U*,V*> {U* a = nullptr; V* b = nullptr;};



Which version is chosen for a given instantiation, e.g., Pair<int*,int*> ?

17
     and variable templates since C++14




52
                                                                 4.1. Template Specialization


// 1) base template: general version
template<typename U, typename V = U>
  struct Pair;

// 2) partial specialization: U = V
template<typename U>
  struct Pair<U,U>;
// or shorter: Pair<U>

// 3) partial specialization: pointers
template<typename U, typename V>
  struct Pair<U*,V*>;

// 4) full specialization: int pointers
template<>
  struct Pair<int*,int*>;
// again, alternatively Pair<int*>



C++ always chooses the most specialized version:
   • Pair<int*,int*> and Pair<int*> are (4), the latter via default argument in (1)

   • Pair<int,int> and Pair<int> are both (2)

   • Pair<int*,double*> is (3)

But Pair<double*,double*> and Pair<double*> are ambiguous, both (2) and (3) would
fit!

Avoid overlapping specializations — they cause compiler errors when triggered
Things are slightly more complicated for function templates. Assume for a moment that we
have a function template for matrix-vector products:

template<typename Mat, typename Vec>
  Vec multiply(const Mat& mat, const Vec& vec);



If we had a class called SparseMatrix for sparse matrices, i.e., matrices where almost all
entries are zero, this generic function would likely be very inefficient for such a matrix. It
makes sense to provide a partial template specialization:

template<typename Vec>
  Vec multiply<SparseMatrix,Vec>
      (const SparseMatrix& mat, const Vec& vec);



Unfortunately, this isn’t possible in C++.
A full specialization would be allowed, and we could even omit the parameters:



                                                                                           53
4. Advanced Topics


template<>
  VectorClass multiply<SparseMatrix,VectorClass>
      (const SparseMatrix& mat, const VectorClass& vec);

// short version
template<>
  VectorClass multiply
  (const SparseMatrix& mat, const VectorClass& vec);



Alas, we would have to specialize for any of possibly many vector classes we would like to use
together with SparseMatrix .

But why is that the case? Why can’t we simply provide a partial specialization?


Dimov/Abrahams Example

Consider the following two code snippets18 :

// (1) first base template
template<typename T> void f(T);

// (2) second base template (overloads)
template<typename T> void f(T*);

// (3) full specialization of (2)
template<> void f(int*);



// (1') first base template
template<typename T> void f(T);

// (3') full specialization of (1')!
template<> void f(int*);

// (2') second base template (overloads)
template<typename T> void f(T*);




(2) and (2’) could be both a simple overload for, or a specialization of, (1) resp. (1’). In C++,
the former is the case.

Now, consider the call f(p) for an int* p . This calls (3) as expected, but (2’) for the second
snippet19 ! Why?
18
     see http://www.gotw.ca/publications/mill17.htm
19
     interactive version: link to godbolt.org




54
                                                                     4.1. Template Specialization


Because in C++, overloads are resolved using base templates and normal functions, and then
specializations are taken into account!
As we have seen, even full function template specializations can lead to counterintuitive results,
which may explain why partial ones are currently not part of the language.

This is a common issue with C++, where a newer feature (here: templates) has to take older
ones (here: overloading) into account. The growth of C++ can be likened to onion layers, or
strata of subsequent civilizations, and newer additions have to interact with well-established
features.

This is also the reason why objects have an implicit self-reference pointer called this , and
not a reference with that name: references were introduced after classes.

A particularly involved example we will consider throughout the lecture is compile-time tem-
plate selection, which started with → SFINAE, is supported via → enable_if in C++11,
→ enable_if_t in C++14, and → if constexpr in C++17, and will be a standard appli-
cation of → concepts in C++20.


Template Specialization (cont.)

Conclusions drawn from the aforementioned observations:


Class template specializations
These are perfectly fine, whether explicit (full) or partial specializations.



Function template specializations
Partial specializations are forbidden, use a helper class with partial specialization or incorporate
the function as a method instead. Explicit specializations may interact in counterintuitive
ways and are completely unnecessary: all types are fully specified, so simply provide a normal
function overload instead.


“Use specializations for class templates, and overloads for function templates”
What about our motivating example, the sparse matrices?

We have the following options:
   • Use overloading for one of the arguments, and templates for the other:




                                                                                                 55
4. Advanced Topics


       template<typename Vec>
         Vec multiply(const SparseMatrix& mat, const Vec& vec);
       // + versions for each alternate matrix class


     • Variant: make the multiply function a method for each of the matrix classes, maybe
       with some default that can be inherited from some base class.

     • Hand computation over to some helper class template, which may freely use partial spe-
       cialization:

       template<typename Mat, typename Vec>
         Vec multiply(const Mat& mat, const Vec& vec)
         {
           return multHelper<Mat,Vec>::multiply(mat,vec);
         }



4.2. Templates and Inheritance

Classes and Templates: Interactions

The combination of object-oriented programming (inheritance) and generic programming (tem-
plates) leads to complications during name lookup that have to be studied in detail. C++ treats
dependent and independent base classes in different ways.


Independent base classes are those base classes that are completely determined without consid-
ering template parameters, and independent names are unqualified names that don’t depend
on a template parameter.


     • Independent base classes behave essentially like base classes in normal (non-template)
       classes.

     • If a name appears in a class but no namespace precedes it (an unqualified name), then
       the compiler will look in the following order for a definition:

         1. Definitions in the class

         2. Definitions in independent base classes

         3. Template arguments

     • This order of name lookup can lead to surprising results.




56
                                                               4.2. Templates and Inheritance


#include <iostream>

struct Base {typedef int T;};

template<typename T>
  struct Derived : Base
  {
    T val;
  };

int main()
{
  Derived<double> d;
  d.val = 2.5;
  std::cout << d.val << std::endl;
}


   • This program prints 2 , not 2.5 as it may seem: link to godbolt.org
   • T is not defined in the class, but in the independent base class
   • Therefore, the template argument is ignored, and T is int ! Main issue: this can’t be
     seen when looking at Derived !
   • Use different naming schemes for types and type placeholders (template parameters)

Dependent base classes are those that are not independent, i.e., they require the specification
of at least one template argument to be fully defined.


   • The C++ standard dictates that independent names appearing in a template are resolved
     at their first occurrence.
   • The strange behavior from the last example relied on the fact that independent base
     classes have higher priority during name lookup than template parameters.
   • However, for names defined in a dependent base class, the resolution would depend on
     one or more template parameters, unknown at that point.
   • This would have consequences when an independent name would have its definition in a
     dependent base class: it would have to be looked up before that is actually possible.

template<typename T> struct Base {int n;};

template<typename T>
  struct Derived : public Base<T>
  {
    void f()
    {
      // (1) Would lead to type resolution




                                                                                            57
4. Advanced Topics


           //     and binding to int.
           n = 0;
       }
     };

template<>
  struct Base<bool>
  {
    // (2) Template specialization wants to
    //     define variable differently.
    enum {n = 42};
  };

void g(Derived<bool>& d)
{
  d.f(); // (3) Conflict
}




     1. In the definition of class Derived<T> , the first access to n in f() would lead to binding
         n to int (because of the definition in the base class template).

     2. Subsequently, however, the type of n would be modified into something unchangeable
        for the type bool as template parameter.

     3. In the instantiation (3) a conflict would then occur.

In order to prevent this situation, C++ defines that independent names won’t be searched
in dependent base classes. Base class attribute and method names must therefore be made
dependent, so that they will only be resolved during instantiation (delayed type resolution).



Instead of simply writing n = 0; , use:

      • this->n = 0; (implicitly dependent through this pointer)

      • Base<T>::n = 0; (explicitly mentions template parameter)

      • or import n into the current namespace:

           template<typename T>
             struct Derived : Base<T>
             {
               // name is now dependent for whole class
               using Base<T>::n;
               ...
             };




58
                                                   4.3. RAII: Resource Acquisition is Initialization


4.3. RAII: Resource Acquisition is Initialization

Multiple Resource Allocation

Often, especially in constructors, resources must be allocated several times in succession (open-
ing files, allocating memory, entering a lock in multithreading):

void useResources()
{
  // acquire resource r1
  ...
  // acquire resource r2
  ...
  // acquire resource rn
  ...

    // use r1...rn here

    // release in reverse
    // release resource rn
    ...
    // release resource r1
    ...
}



     • If acquiring rk fails, r1 , . . . , rk−1 have to be released before cancellation is possible, oth-
       erwise a resource leak is created.
     • What should be done if allocating the resource throws an exception that is caught outside?
       What happens to r1 , . . . , rk−1 ?

Common variant of this problem:

class X
{
  public:
    X();
    ~X();

     private:
       A* pointerA;
       B* pointerB;
       C* pointerC;
};


X::X()
{
  pointerA = new A;
  pointerB = new B;



                                                                                                      59
4. Advanced Topics


     pointerC = new C;
}

//    How can we guarantee that
//    pointerA is freed if
//    allocating pointerB or
//    pointerC fails?




RAII


In C++, the correct solution for this problem is called “Resource Acquisition is Initialization”
(RAII), which:

     • is based on the properties of constructors and destructors and their interaction with
       exception handling.

     • is actually a misnomer: “Destruction is Resource Release” (DIRR) would be more ap-
       propriate, but the acronym RAII is now too well-known to change it.


RAII is a specific way of thinking about resources that originated in C++ and provides an
elegant alternative to strategies used in Java or Python, etc. (and has a really unfortunate
name for something so central. . . ).


RAII: Rules for Constructors and Destructors


Central rules that enable RAII:

     1. An object is only fully constructed when its constructor is finished.

     2. A compliant constructor tries to leave the system in a state with as few changes as
        possible if it can’t be completed successfully.

     3. If an object consists of sub-objects, then it is constructed as far as its parts are con-
        structed.

     4. If a scope (block, function. . . ) is left, then the destructors of all successfully constructed
        objects are called.

     5. An exception causes the program flow to exit all blocks between the throw and the
        corresponding catch .



60
                                               4.3. RAII: Resource Acquisition is Initialization


The interplay of these rules, especially the last two, automatically frees resources before leaks
can happen, even when unexpected errors occur.
template<typename T>
  class Ptr
  {
    public:
      Ptr()
      {
        pointerT = new T;
      }

      ~Ptr()
      {
        delete pointerT;
      }

      T* operator->()
      {
        return pointerT;
      }

    private:
      T* pointerT;
  };




class X
{
  // no constructor and destructor
  // needed, the default variant
  // is sufficient
  private:
    Ptr<A> pointerA;
    Ptr<B> pointerB;
    Ptr<C> pointerC;
};

int main()
{
  try
  {
    X x;
  }
  catch (std::bad_alloc)
  {
    ...
  }
}




(This is actually a simple mock-up of → smart pointers)

Basic principle:

   • The constructor X() calls the constructors of pointer{A,B,C} .



                                                                                              61
4. Advanced Topics


      • When an exception is thrown by the constructor of pointerC , then the destructors of
        pointerA and pointerB are called and the code in the catch block will be executed.

      • This can be implemented in a similar fashion for the allocation of other resources (e.g.
        open files).


Main idea of RAII:

      • Tie resources (e.g., on the heap) to handles (on the stack)20 , and let the scoping rules
        handle safe acquisition and release

      • Repeat this recursively for resources of resources

      • Let the special rules for exceptions and destructors handle partially-constructed objects


4.4. Template Metaprogramming

Template metaprogramming refers to the use of templates to perform computations at compile-
time. This comes in basically two flavors:

      • Compute with numbers as usual, but during the compilation process

      • “Compute” with types, i.e., automatically map some types to other types

The former precomputes results to speed up the execution of the finished program, while the
latter is something that is impossible to achieve during runtime.


Template metaprogramming can’t make use of loops and is therefore inherently recursive when
performing nontrivial computations, and may become arbitrarily complex (it is Turing com-
plete!). We will only look at some small introductory examples.


→ SFINAE can be seen as a special case of template metaprogramming. → Constant expres-
sions can often serve as a modern replacement for template metaprogramming, especially since
loops in constexpr functions have been added in C++14.


Compile-Time Computations


An example of compile time single recursion to compute the factorial:

20
     heap: anonymous memory, freely allocatable; stack: automatic memory for local variables




62
                                                             4.4. Template Metaprogramming


template<int N>
  struct Factorial
  {
    enum {value = Factorial<N-1>::value * N};
  };

// base case to break infinite recursion
template<>
  struct Factorial<0>
  {
    enum {value = 1};
  };


In modern C++, this can be simplified significantly using → variable templates, because one
doesn’t need enums or static attributes for the values anymore.


Recursive Type Construction

Automated type generation using template metaprogramming:

template<typename T, int N>
  struct Array : public Array<T, N-1>
  {
    template<int M>
      T& entry()
      {
        return Array<T, M+1>::val;
      }

    // hide implementation detail
    protected:
      T val;
  };

// empty base case
template<typename T>
  struct Array<T,0> {};

// use like this:
Array<double,3> a;
a.entry<0>() = 0;
a.entry<1>() = 1;
a.entry<2>() = 2;



   • Recursive definition: array of N elements is array of N − 1 elements plus additional value

   • Helper method for element access

   • Nice feature: going beyond array bounds triggers compilation error



                                                                                            63
4. Advanced Topics


      • Storing different types as in std::tuple would require → variadic templates and be
        significantly less straight-forward to implement

      • Also see dune-typetree, a library for compile-time construction and traversal of tree
        structures, link: gitlab.dune-project.org/staging/dune-typetree


4.5. Dynamic Polymorphism

Inheritance is based on new code utilizing old code: we augment an existing class with new
data/methods, and these can make use of the interface and/or implementation of the base
class.


The main idea behind dynamic polymorphism (subtyping) is trying to make the inverse work:
have old code utilize new code, i.e., we want to inject new behavior into classes and function
without modifications to existing code.


Here, the concept polymorphism (greek: “many forms”) refers to several functions sharing a
common interface, with the concrete variant that is chosen depending on the provided argu-
ments. The desired injection of new behavior is achieved by making this selection independent
of the function call site.


Types of Polymorphism

There are different types of polymorphism:

      • Static polymorphism, with the function being chosen at compile-time:

            – Ad-hoc polymorphism, in the form of function and operator overloading

            – Parametric polymorphism, in the form of templates and specializations

            – “True” → static polymorphism, trying to emulate dynamic polymorphism using
              template metaprogramming

         Also known as early binding, i.e., during program creation.

      • Dynamic polymorphism, with the function being chosen at runtime (also known as late
        binding).


In C++, static polymorphism is multiple dispatch (the combination of types determines the
chosen variant), while dynamic polymorphism is always single dispatch21 (only depends on the
object itself, not the method arguments).
21
     Dynamic multiple dispatch exists, e.g., in the Julia language.




64
                                                                   4.5. Dynamic Polymorphism


Slicing

Using the copy constructor or assignment operator of a base class on some object results in
slicing: anything belonging to the derived class is cut away, and only the base class part is
used in the assignment.

Something similar happens when a base class pointer or base class reference referring to an
object of the derived class is created: only the base class methods and attributes are accessible
through such pointers and references. If the derived class redefines certain methods, then the
base class version is used anyways, i.e., the pointer/reference type dictates behavior, not the
type of the object itself.


Polymorphic Types

Polymorphic types are classes that have at least one method defined as virtual . For such
methods, the type of the object itself determines which version is actually called, not the type
of references or pointers:

struct Base
{
   virtual void foo() {...};
};

struct Derived : public Base
{
  void foo() override {...};
}

Derived d;
Base& b = d;
b.foo(); // calls foo() of Derived




   • In C++, methods have to be explicitly declared as virtual for this to work. In some
     other languages this behavior is actually the default, e.g., in Java.
   • There is no reason to redeclare the same method as virtual in derived classes, this
     happens automatically. But one may use override to state that this method should
     override a base class method, and will get a compilation error if this doesn’t actually
     happen.
   • Without the override qualifier the compiler would silently introduce an overload if the
     signature doesn’t match, and consequently the base class method might be called when
     that isn’t expected to happen.



                                                                                              65
4. Advanced Topics


Implementation Detail: vtables

Using polymorphic types, i.e., virtual functions, incurs a certain runtime cost: the concrete
version to use for a certain function call has to be decided at runtime, because it depends on
the actual type of the object pointed / referred to (that’s the whole point).


A standard way of implementing virtual functions is via vtables (dispatch tables):

     • Each class with at least one virtual method stores hidden static tables of virtual functions
       (“vtables”), one for each base class with virtual methods, and potentially one for the class
       itself.

     • These tables contain function pointers to the right method versions.

     • Each object of such a class contains hidden pointers to the relevant vtables.

     • These are inherited from the base class and therefore remain after slicing, etc., but are
       redefined to point to the local version of the tables.

struct Base1
{
  // creates entry in vtable
  virtual void foo1();
}

struct Base2
{
  // creates entry in vtable
  virtual void foo2();
}

struct Derived : Base1, Base2
{
  // changes entry in vtable
  void foo2() override;
  // creates entry in vtable
  virtual void bar;
}

Derived d;
Base2& b2 = d;
// follow two pointers for call here
// (p. to table and p. to function)
b2.foo2();



     • The class Derived contains three implicitly defined static tables, one for itself, and one
       for the two base classes each.

     • The table from Base1 is copied, but that of Base2 is changed locally, with its entry
       pointing to Derived::foo2 instead.

     • The call b2.foo2() accesses the vtable through the hidden pointer, and then uses the
       function pointer to Derived::foo2 it finds there.



66
                                                                   4.5. Dynamic Polymorphism


   • Cost for lookup: follow two pointers (relevant when the method is very simple and called
     very often)


Virtual Destructors

Polymorphic types can be stored in containers and similar classes via pointers to base classes,
and retain their specialized behavior. This makes it possible to use containers for heterogeneous
collections of objects, as long as they all have a common base class.


However, the container would trigger the destructor of the base class when it goes out of scope,
not the destructors of the derived classes. For this reason it is common practice to declare a
public virtual destructor when at least one other virtual method is present, to ensure that the
destructors of the derived classes are called.


Note that this suppresses the automatic generation of copy/move constructors and operators,
but normally directly copying polymorphic types isn’t a good idea anyways.


Copying Polymorphic Types

If a polymorphic object is copied when accessed through a base class pointer, the base class
constructor is used. This means that unintended slicing occurs: only the base class part is
copied, and virtual method calls revert back to the version of the base class.


The desired behavior would usually be a full copy of the object, i.e., based on its true type
and consistent with dynamic polymorphism. This would require something like a “virtual
constructor” that constructs the correct type. But constructors can’t be virtual, because they
are not tied to objects — they are part of the class itself, like static methods.


The standard solution to this problem is:
   • explicitly forbid copying (and moving) polymorphic objects
   • provide a special clone method, that serves the purpose of such virtual constructors, but
     operates on the level of pointers
class Base
{
  // define copy/move constructors
  // and operators here

  public:
    virtual Base* clone() const
    {
      return new Base(*this);
    }




                                                                                              67
4. Advanced Topics


     // virtual destructor
     virtual ~Base() {}
};

class Derived : public Base
{
  // as above

  public:
    Derived* clone() const override
    {
      return new Derived(*this);
    }
};



     • Calling the clone method on a Base pointer will create a copy of the correct type and
       return a pointer to it.

     • Using covariant return types (see LSP) we may return a pointer to the actual type.

     • This pattern doesn’t follow RAII at all. This can be changed using → smart pointers,
       but then a pointer to the base class has to be used throughout, since smart pointers of
       covariant types are not themselves covariant.


Abstract Base Classes


Abstract base classes (ABCs) are base classes that have at least one method declared as purely
virtual, i.e., declared as a virtual function, but without actually providing a default implemen-
tation:

struct ABC
{
   virtual void foo() = 0; // pure virtual: no definition provided
   virtual ~ABC() {} // virtual destructor
};


Abstract base classes are used to define interfaces, because of the following two properties:

     • It is impossible to instantiate objects of such a class, because at least one method is
       missing a definition.

     • Every derived class has to provide such a definition to become instantiable.


Example from DUNE




68
                                                                   4.5. Dynamic Polymorphism


template<class X, class Y>
class Preconditioner {
  public:
    //! \brief The domain type of the preconditioner.
    typedef X domain_type;
    //! \brief The range type of the preconditioner.
    typedef Y range_type;
    //! \brief The field type of the preconditioner.
    typedef typename X::field_type field_type;

     //! \brief Prepare the preconditioner. (...)
     virtual void pre (X& x, Y& b) = 0;

     //! \brief Apply one step of the preconditioner (...)
     virtual void apply (X& v, const Y& d) = 0;

     //! \brief Clean up.
     virtual void post (X& x) = 0;

     ...

     //! every abstract base class has a virtual destructor
     virtual ~Preconditioner () {}
};




Abstract base class for preconditioners:

     • defines some types

     • declares some methods, but doesn’t provide implementations

     • includes virtual destructor


Multiple Inheritance

Multiple interfaces can be combined by inheriting from several ABCs:

// VectorInterface: interface for vector types
// FunctionInterface: interface for functors
class Polynomial : public VectorInterface, FunctionInterface
{
  // define any methods required by the ABCs
}


VectorInterface might define (but not implement) all the usual methods for vector arith-
metics, while FunctionInterface would require an appropriate operator() .


In the above code example, FunctionInterface is not a template, and therefore would
describe the usual functions of a single real variable, but it wouldn’t be difficult to provide a



                                                                                              69
4. Advanced Topics


class template as ABC instead. This would also cover more general functions (and technically
define a parameterized family of ABCs).

Multiple inheritance is simple for ABCs, because they typically don’t contain data. Therefore,
the interface conditions are simply restated, i.e., this form of multiple inheritance is perfectly
fine.


Multiple inheritance of base classes containing data, however, may lead to duplication of data
members. To avoid this, virtual inheritance can be used: the derived class contains one copy of
the base class per non-virtual derivation, and a single one for all virtual derivations combined.


This diamond pattern, sometimes called Deadly Diamond of Death, typically leads to code
that is hard to maintain and may contain subtle bugs:

     • Forgetting one of the virtual specifiers silently creates a second copy of the base class
       data.

     • Accessing the data in this second unmaintained version by accident will make the state
       of the derived object inconsistent.


Example from DUNE


In practice, the diamond pattern is discouraged because of the resulting high maintenance cost.
However, earlier versions of PDELab contained a Newton method based on this pattern that
may serve as demonstration.


A Newton method consists of:

     • a basic algorithm

     • steps that must be performed at the start of each Newton iteration (e.g. reassembly of
       the Jacobi matrix)

     • a test whether the process has converged

     • optionally a linesearch to enlarge the convergence area

Each of these intermediate steps is outsourced to a separate class, so you can replace all the
components independently.


The common data and virtual methods are placed in a base class.




70
                                                                       4.6. Static Polymorphism


class NewtonBase // stores data to operate on, iteration count, etc.
{...};

// perform linear solve, compute step direction
class NewtonSolver : public virtual NewtonBase
{...};

// check termination criterion
class NewtonTerminate : public virtual NewtonBase
{...};

// perform line search strategy
class NewtonLineSearch : public virtual NewtonBase
{...};

// local linearization (jacobian), thresholds, etc.
class NewtonPrepareStep : public virtual NewtonBase
{...};

// combine above classes into one complete class
class Newton : public NewtonSolver, public NewtonTerminate,
               public NewtonLineSearch, public NewtonPrepareStep
{...};



The actual implementation combined this with templatization on all levels.


4.6. Static Polymorphism

Just as dynamic polymorphism refers to the ability of code to adapt to its context at runtime,
with dynamic dispatch on the type of objects, static polymorphism refers to polymorphism at
compile-time with similar goals.


We have already discussed two versions of such static polymorphism:
   • function and operator overloading
   • templates and their specializations
Older code may then be adapted to new types by adhering to the relevant interfaces. But there
is also a specific pattern for static polymorphism that mimics virtual function calls, but resolved
at compile-time: base class templates using the curiously recurring template pattern.
In the Curiously Recurring Template Pattern (CRTP), some class is used as a template pa-
rameter of its own base class. This is actually valid C++, because the full definition of the
derived class isn’t required at that point, only during instantiation.

template<typename T>
  class Base
  {
    // access to members of T through template parameter
    ...



                                                                                                71
4. Advanced Topics


     };

class Derived : public Base<Derived>
{
   ...
};


Also sometimes called Upside-Down Inheritance, because class hierarchies can be extended
through different base classes using specialization.

// base class: provide interface, call impl.
template<typename T>
  struct Base
  {
    static void static_interface()
      {T::static_implementation();}

       void interface()
         {static_cast<T*>(this)->implementation();}
     };

// derived class: provide implementation
struct Derived : public Base<Derived>
{
  static void static_implementation();
  void implementation();
}

// call this with Derived object as argument
template<typename T>
  void foo(Base<T>& base)
  {
    Base<T>::static_interface();
    base.interface();
  }



      • static_cast converts pointer types at compile-time, is type-safe (i.e., only works if
        Base object is actually part of a T object)

      • Base class provides interface definition like in ABCs

      • Avoids cost of virtual functions

      • Not a full replacement for dynamic polymorphism, e.g., no common base class as needed
        for STL containers


Example from DUNE

Mixin to define finite element Jacobian in terms of residual evaluations, with Imp being both
template parameter and derived class:




72
                                               4.7. SFINAE: Substitution Failure is not an Error


template<typename Imp>
class NumericalJacobianVolume
{
public:
  ...

   //! compute local jacobian of the volume term
   template<typename EG, typename LFSU, typename X, typename LFSV, typename Jacobian>
   void jacobian_volume (const EG& eg, const LFSU& lfsu, const X& x,
                         const LFSV& lfsv, Jacobian& mat) const
   {
     ...
     asImp().alpha_volume(eg,lfsu,u,lfsv,downview);
     ...
   }

private:
  const double epsilon; // problem: this depends on data type R!
  Imp& asImp () { return static_cast<Imp &> (*this); }
  const Imp& asImp () const { return static_cast<const Imp &>(*this); }
};




4.7. SFINAE: Substitution Failure is not an Error

SFINAE

We have already discussed how the compiler chooses between several available template spe-
cializations: it picks the “most specialized” version for which the template parameters lead to
successful instantiation.


During this selection process, other (more specialized) versions may have to be tried out, but
ultimately rejected when substituting the parameters with the given types fails. Therefore:


“Substitution Failure is not an Error” (SFINAE)


Failing to instantiate one of the specializations doesn’t terminate the compilation process, that
happens only when the pool of possible choices has been exhausted and no viable specialization
was found, or several that are equally suitable.
SFINAE, the programming technique with the same name, provides a mechanism to select
between different template specializations at will, and achieves this by triggering substitution
failures on purpose.


The main tool for this is a small template metaprogram named enable_if , which provides a
type definition or doesn’t, depending on some external condition:




                                                                                              73
4. Advanced Topics


// possible implementation of enable_if
// ``false case'' --> no dependent type defined
template<bool B, class T = void>
  struct enable_if {};

// ``true case'' --> dependent type is defined
template<class T>
  struct enable_if<true, T> {typedef T type;};




Picking Numerical Solvers


Assume we have a set of numerical problems, say, ProblemA , ProblemB , and maybe others,
and also a set of solvers for such problems, Solver1 , Solver2 , and potentially some others.
We decide to manage the different combinations using a common interface:

template<typename Problem,
         typename Solver = Problem::DefaultSolver>
  void solve(...)
  {
    // instantiate Problem and configure it
    // instantiate appropriate Solver
    // apply Solver to Problem
    // postprocessing, etc.
  }


This works fine as long as every problem class defines an appropriate solver default.

There are two points to consider:

     • The default solver is only a suggestion and another one may be chosen by the user.
       This includes solvers that compile fine but are maybe not numerically stable for the given
       problem!

     • Maybe we want to eliminate the Solver template parameter altogether, instead auto-
       matically choosing a suitable solver from our collection for any problem that is passed to
       the solve function.

For this to work, we have let our different solve variants know about certain “properties”
of the problem classes, and maybe also of the solver classes. These can then be used to mask
those combinations we don’t want to be used via SFINAE.



74
                                                     4.7. SFINAE: Substitution Failure is not an Error


Solving Linear Systems


To discuss a smaller application in more detail and show how SFINAE is actually used, let us
assume we have several matrix classes available, and some of them provide a method named
 multiplyWithInverse that solves
                                           Ax = b

in a very specialized and efficient way22 , and others don’t. We want to make use of this
functionality when it is available, of course, i.e., we have to check whether the method exists.


We need the following ingredients:

      • A traits class template that checks for the existance of the above method

      • The aforementioned enable_if to potentially define a type, depending on the “return
        value” of the traits class

      • Different solve functions, picking the right one with SFINAE

Traits class for existance of method, adapted from Jean Guegant’s example23 :

template <class T1, class T2>
  struct hasMultiplyWithInverse
  {
    // Truth values at compile time
    typedef struct{char a; char b;} yes; // size 2
    typedef struct{char a;}         no; // size 1

       // Helper template declaration
       template <typename U, U u> struct isMethod;

       // Picked if helper template declares valid type, i.e. if signature matches:
       // - const C1 as implicit first argument to method [(C1::*) ... const]
       // - non-const reference to C2 as second argument, and void as return type
       template <typename C1, typename C2>
         static yes test(isMethod<void (C1::*)(C2&) const, &C1::multiplyWithInverse>*) {}

       // Catch-all default (templatized C-style variadic function)
       template <typename,typename> static no test(...) {}

       // Export truth value (as enum, alternatively as static const bool)
       // Trick: sizeof works without actually calling the function
       enum {value = (sizeof(test<T1,T2>(0)) == sizeof(yes))};
     };




Two versions of the solve function, one of them being selected by SFINAE:

22
     e.g., some precomputed matrix decomposition, a specialized spectral method, etc.
23
     https://jguegant.github.io/blogs/tech/sfinae-introduction.html




                                                                                                   75
4. Advanced Topics


template<typename M, typename V>
  typename std::enable_if<hasMultiplyWithInverse<M,V>::value>::type
  solve(const M& m, V& v)
  {
    // implement specialized version here, can use multiplyWithInverse
  }

template<typename M, typename V>
  typename std::enable_if<!hasMultiplyWithInverse<M,V>::value>::type
  solve(const M& m, V& v)
  {
    // implement general version here, has to avoid multiplyWithInverse
  }


Standard placements of enable_if :
     • Additional template parameter, hidden by assigning default value
     • Additional function argument, hidden by assigning default value
     • Return type of function (chosen above)


SFINAE (cont.)

Central steps to make this form of SFINAE work:
     • Use template specialization to determine at compile time whether some type has a certain
       method or not
     • Use this inside enable_if to trigger substitution failures on purpose
     • Guide compiler to select correct specialization (i.e., remove ambiguity)

The SFINAE code presented above is valid C++98/03. C++11 and later standards introduced
several features that make SFINAE significantly simpler, or help avoid it altogether:
     • → Constant expressions, e.g., to avoid the sizeof trick/hack
     • Predefined → type traits, to simplify writing conditionals
     • Mapping between values and their types (→ decltype/declval)
A simplified version based on C++11, extracting type information using decltype:

template<typename T1, typename T2>
  struct hasMultiplyWithInverse
  {
    template<typename T, typename U = void> // (1)
      struct Helper
      {enum {value = false};};

       template<typename T>                                 // (2)



76
                                               4.7. SFINAE: Substitution Failure is not an Error


        struct Helper<T,decltype(&T::multiplyWithInverse)>
        {enum {value = true};};

     // matches (2) if function exists, else matches (1)
     enum {value = Helper<T1,void (T1::*)(T2&) const>::value};
   };

By the way, T1::* is a so-called pointer to member, which are, e.g., the types of pointer stored
in the vtables we discussed.


Example from DUNE

The product of two functions on a numerical grid can be defined if
   • both have the same dimensionality (i.e., scalar product of vector fields)
   • or one of them is a scalar function (i.e., simple scaling)
Base template, handling the first case (also an example of CRTP):

//! Product of two GridFunctions
template<typename GF1, typename GF2, class = void>
class ProductGridFunctionAdapter :
  public GridFunctionBase<
    GridFunctionTraits<
      typename GF1::Traits::GridViewType,
      typename GF1::Traits::RangeFieldType, 1,
      FieldVector<typename GF1::Traits::RangeFieldType, 1> >,
    ProductGridFunctionAdapter<GF1,GF2> >
{
  static_assert(unsigned(GF1::Traits::dimRange) ==
                unsigned(GF2::Traits::dimRange),
                "ProductGridFunctionAdapter: Operands must have "
                "matching range dimensions, or one operand must be "
                "scalar-valued.");
  ...
};




Specializations for the two different semi-scalar cases, using SFINAE:

//! Product of two GridFunctions
template<typename GF1, typename GF2>
class ProductGridFunctionAdapter<GF1, GF2,
  typename std::enable_if<
    GF1::Traits::dimRange == 1 && GF2::Traits::dimRange != 1
    >::type> :
  public GridFunctionBase<typename GF2::Traits, ProductGridFunctionAdapter<GF1,GF2> >
{
  ...
};

//! Product of two GridFunctions




                                                                                             77
4. Advanced Topics


template<typename GF1, typename GF2>
class ProductGridFunctionAdapter<GF1, GF2,
  typename std::enable_if<
    GF1::Traits::dimRange != 1 && GF2::Traits::dimRange == 1
    >::type> :
  public ProductGridFunctionAdapter<GF2, GF1>
{
public:
  ProductGridFunctionAdapter(GF1& gf1, GF2& gf2)
    : ProductGridFunctionAdapter<GF2, GF1>(gf2, gf1)
  { }
};




78
5. C++11 Features

C++11 was a major update that fundamentally changed the “look and feel” of C++ programs.
Several minor improvements of C++11 have been directly incorporated into the previous sec-
tions, e.g., rvalue references, defaulted/deleted methods, delegating constructors, etc. We will
now discuss further topics in greater detail:

   • automatic type deduction
   • compile-time evaluation
   • move semantics
   • smart pointers
   • lambda expressions (closures)
   • variadic templates
   • threads and concurrency
   • assorted new features and libraries


5.1. Automatic Type Deduction

Quite often, the type of a variable is can be inferred from the surrounding code, e.g., from:
   • the type of literals
   • the return type of functions
C++11 introduces the keyword auto , which instructs the compiler to automatically deduce
the type from context:

auto   i   =   27;      //   int literal -> i is int
auto   j   =   5u;      //   unsigned literal -> j is unsigned
auto   x   =   5.4;     //   x is double
auto   y   =   2.9f;    //   y is float
auto   z   =   foo();   //   type of z is return type of foo()


Qualifiers may be added if needed, e.g., const auto& to refer to constant references instead.
Benefits of automatic type deduction:
   • More concise code, very similar to code from weakly-typed languages
   • No need for complicated lines to extract type information
   • No need for tricks when the required types vary between template instantiations



                                                                                                79
5. C++11 Features


// very verbose piece of code
// only works for types T with these exact dependent types
typename T::grid_type::element_type::dof_type dof = ...
// ==> auto is more concise and more general



Drawbacks of automatic type deduction:

     • May have to look up function signatures, because resulting type may depend on some
       far away function definition (may make code harder to read)

     • Doesn’t work if another type is desired (type promotions and automatic conversions)


Trailing Return Type


C++11 also introduces trailing return types, using the auto keyword:

// deferred definition of return type
auto multiply(const Matrix& m, const Vector& v) -> Vector
{ ... }


This is particularly useful when the return type should be automatically deduced from the
arguments:

// return type is whatever sum of types X and Y produces
auto foo(X x, Y y) -> decltype(x + y)
{ ... }


Here decltype produces the type that auto would deduce for its argument.


Trailing return types become mostly irrelevant in C++14 and above, because these standards
can deduce the return type from return statements.


Example from DUNE


Trailing return types remain useful for SFINAE on trailing return types: the existance or
nonexistance of a certain operation is tested using the function arguments. This is a direct
precursor of concepts as they are introduced by C++20.




80
                                                                      5.2. Compile-Time Evaluation


  template<class A, class B>
  auto dot(const A & a, const B & b)
      -> typename std::enable_if<!IsVector<A>::value
          && !std::is_same<typename FieldTraits<A>::field_type,
          typename FieldTraits<A>::real_type> ::value, decltype(conj(a)*b)>::type
  {return conj(a)*b;}

  // same for a*b instead of conj(a)*b

template<typename A, typename B>
  auto dot(const A & a, const B & b)
      -> typename std::enable_if<IsVector<A>::value, decltype(a.dot(b))>::type
  {return a.dot(b);}

  template<class A, class B>
  auto dotT(const A & a, const B & b) -> decltype(a*b)
  {return a*b;}




5.2. Compile-Time Evaluation

We have already discussed template metaprogramming as a way to perform computations at
compile-time. C++11 introduced constant expressions using the keyword constexpr , making
such computations significantly simpler.

In C++11 these computations are rather restricted, e.g., constexpr functions must

   • not call non- constexpr functions

   • consist of a single return line

Later standards lift some of the restrictions, e.g., the latter one above.

// template metaprogramming factorial
template<int N>
  struct Factorial
  {
    enum {value = Factorial<N-1>::value * N};
  };

// base case to break infinite recursion
template<>
  struct Factorial<0>
  {
    enum {value = 1};
  };

  // equivalent constexpr function
  constexpr int factorial(int n)
  {
    return (n > 0) ? (factorial(n-1) * n) : 1;
  }




                                                                                               81
5. C++11 Features


5.3. Move Semantics

Before C++11, C++ knew essentially two types of semantics for entities:
value semantics: assignment creates a new, independent entity (behavior for normal variables)
reference semantics: assignment creates an alias (behavior for references, and pointers through
      indirection)
Programmers can choose between these two types of semantics for their own classes: the
behavior is defined by the copy constructor and assignment operator producing
      • a deep copy (value semantics)
      • or a shallow copy (reference semantics)


C++11 introduced a third type of semantics, move semantics. Here, assignment creates neither
a copy nor an alias, but moves the resources from one object to the other.
Move semantics are based on rvalue references: while a normal, classical reference T& is an
alias for a separate, persistent entity, an rvalue reference T&& refers to, e.g., temporary values
that don’t have to be kept around for a later point in time. More precise, but maybe also more
confusing, categorizations (lvalue, prvalue, xvalue, glvalue, and rvalue) are available online24 .


The contents of such rvalue references may therefore be used to initialize other variables by
simply reusing the memory, eliminating the need for explicit copies and associated memory
allocations.


The function std::move() can be used to create rvalue references to variables, thereby ex-
plicitly marking them as movable. After such an explicit move, the original variable is typically
in a valid but unspecified state.


Move Constructors and Assignment

User-defined data types need move constructors and assignment operators to make use of this
efficient form of transfer. They are auto-generated as long as all attributes are movable and
the user provided neither copy nor move constructors / assignment operators.

struct Foo
{
  int i;

      Foo(Foo&& other)
      : i(std::move(other.i))
24
     See https://en.cppreference.com/w/cpp/language/value_category




82
                                                                          5.4. Smart Pointers


   {}

   Foo& operator=(Foo&& other)
   {
     i = std::move(other.i);
     return *this;
   }

   • Move constructors / assignment operators are quite similar to copy constructors / as-
     signment operators.
   • The members of rvalue references aren’t rvalues themselves, therefore they have to be
     moved explicitly.
   • The methods should leave the donor in an arbitrary valid state (e.g., assign nullptr to
     any pointer members).


Forwarding References

There are two locations where the two ampersands && don’t actually denote an rvalue refer-
ence, but rather a forwarding reference, or universal reference:
   • the construct auto&& in → range-based for loops
   • the argument T&& in a function template parameterized by T
In these special situations, what looks like an rvalue reference is actually something that can
bind to both lvalues and rvalues. These special references serve the following purposes:
   • Accept both lvalue references to container contents and rvalue references to values that
     are generated on-the-fly in loop iterations
   • Perfect forwarding: forward both lvalue and rvalue arguments to other functions without
     having to implement a long list of template specializations (which would cover cases /
     combinations)


5.4. Smart Pointers

We have seen that manual memory management can lead to bugs that are quite severe, and
that in C++ the standard solution to this problem is RAII. C++11 introduced smart pointers,
replacements for raw pointers that offer automatic memory management based on RAII.

This is an application of the one responsibility principle: introducing wrapper classes means
that the user-defined class itself doesn’t have to concern itself with memory management.

There are three different types of smart pointer:



                                                                                            83
5. C++11 Features


     • unique_ptr (exclusive ownership)

     • shared_ptr (shared ownership, reference counting)

     • weak_ptr (non-owning handle)



Unique Pointers



The class unique_ptr (for unique pointer) provides memory management with exclusive own-
ership: it manages an internal raw pointer and prevents any copies of the stored address, which
means that the allocated memory will be freed exactly once.

// create a unique pointer
std::unique_ptr<int> p(new int);

// * and -> work as expected
*p = 8;

// unique pointers can't be copied, only moved
auto p2 = std::move(p);
// p is now empty (i.e., invalid)




Unique pointers have the following functionality:

     • Conversion to bool to check for validity (replaces nullptr comparison)


     • A method get() for access to the raw pointer (e.g., for passing the stored object to
       legacy code)

     • Methods for releasing the stored raw pointer, or replacing it with another one



There is also a special unique_ptr version for arrays that doesn’t have dereference operators
and instead provides an operator[] that forwards to the array. This can be used to manage
the lifetime of C-style arrays with a drop-in replacement, while still being able to hand the
raw array to legacy code.



84
                                                                           5.4. Smart Pointers


Shared Pointers




While unique pointers have exclusive ownership, the class shared_ptr (for shared pointer)
provide memory management with shared ownership, as the name implies. Shared pointers
can be used to, e.g., provide pointers to several different objects to share some common central
resource or facility. Reference counting guarantees that the managed object stays alive while
there is at least one shared pointer storing its address.

std::shared_ptr<int> p(new int);
*p = 5;
auto p2 = p;
*p2 = 7; // also changes contents of p

// allocated int stays around as long as either
// p or p2 point to it (or any other copy in
// some function or object)




Shared pointer provide the same methods as unique pointers, and they have an additional
 use_count() method that reports the total number of pointers sharing the resource. There
is also again an operator[] for the array case, but only from C++17 onwards.




Internally, shared pointers store two raw pointers:


   • a raw pointer to the stored object, so that dereferencing has basically the same cost as
     for raw pointers


   • a raw pointer to a manager object (control block), which is responsible for bookkeeping
     tasks (reference counting)


The manager object in turn also stores a pointer to the managed object, because it has to
manage its lifetime (and grants access to weak pointers, as we will see).



                                                                                             85
5. C++11 Features


                                                       shared
                                   shared ptr 1
                                                       object




                                                       control
                                   shared ptr 2         block
                                                      shared: 3




                                   shared ptr 3



     • Solid arrows represent pointers for direct access to the shared object
     • Dashed arrows represent pointers used in reference counting and management
     • The shared count is incremented when a shared pointer constructor is called
     • It is decremented when a shared pointer destructor is called
     • The shared object can be destroyed once the shared count reaches zero


Weak Pointers

The third kind of smart pointer is called weak_ptr , and strictly speaking these weak pointers
aren’t pointers at all: they are non-owning observers of shared resources, and can’t be deref-
erenced. Instead, if the shared object is still valid, i.e., there is at least one shared_ptr to
it, a lock() can be called to gain temporary access through a shared pointer.

std::shared_ptr<int> p(new int);
std::weak_ptr<int> w = p; // link w to p

// w is not expired, since p exists
if (! w.expired())
{
  // create shared pointer
  auto p2 = w.lock();
  // resource is kept alive through p2
  ...
}




86
                                                                          5.4. Smart Pointers


                                             shared
                           shared ptr 1
                                             object




                                             control
                           shared ptr 2       block        weak ptr 1
                                            shared: 3
                                             weak: 2




                           shared ptr 3                    weak ptr 2




   • The control block maintains a second counter, the weak count, which keeps track of the
     number of weak pointers

   • Locking a weak pointer increases the shared count, because the shared pointer constructor
     is called

   • The method expired() returns true when the shared count has reached zero, and
     false otherwise

   • The shared object is destroyed once the share count reaches zero, while the control block
     is only destroyed once both counts reach zero

   • This allows weak pointers to still query the status after all shared pointers are gone


Smart Pointer Creation Functions

There is also a creation function make_shared , and from C++14 onward a companion function
 make_unique . These functions forward their arguments to the constructor of the stored
object, thereby eliminating the last bit of raw pointer handling by hiding the new call.

auto p = std::make_shared<double>(7.);



In contrast to the shared_ptr constructor call, this function allocates space for the stored
object and the manager object in one go. This means the two ways of constructing shared
pointers are not fully equivalent:

   • The function make_shared uses less allocations, and may therefore be faster than the
     constructor call.

   • For make_shared , the space occupied by the stored object is linked to that of the
     manager object due to the joint memory allocation.



                                                                                              87
5. C++11 Features


In other words, the destructor of the managed object is called when the last shared pointer
vanishes, but the memory becomes available again when all shared and weak pointers are
gone.


Smart Pointer Use Cases

Each type of pointer is for a specific set of use cases:

unique_ptr Pass around object that doesn’t have to be available at different locations si-
    multaneously, capture raw pointers from legacy code that have to be managed.

shared_ptr Pass around object that has to be available at different locations, share object
    ownership and manage lifetime collaboratively.

weak_ptr Non-owning, observe shared object and its lifetime, may lock if temporary access
    is required.

raw pointers For interfaces with legacy code, implementation details of classes that are safely
     encapsulated, and sometimes used in modern code to mark pointers as explicitly non-
     owned.

If explicitly managed raw pointers are completely avoided, then each type of pointer has a
clear purpose, and for each managed object there are two sets of pointers:

Owning Pointers Either a single unique_ptr or one or more shared_ptr , which own the
    object and manage its lifetime.

Non-owning Pointers An arbitrary number of weak_ptr and raw pointers, which don’t own
    the object. The former can check object validity, while it has to be guaranteed by the
    calling code for the latter. This also means that raw pointers should never be returned
    or stored, only used and passed along to other functions.


Pitfalls When Using Shared Pointers

With modern compilers, using shared pointers has surprisingly little overhead: dereferencing
them has basically the same cost as directly using pointers. The creation, destruction, and
storage of shared pointers, however, creates a certain overhead:

     • Each set of such pointers requires storage for two raw pointers per shared pointer, and
       additionally for a central control block

     • Creating and destroying shared / weak pointers is more expensive than using raw point-
       ers, due to reference counting

However, these overheads are usually not significant, and a small price for the guaranteed
proper memory management.



88
                                                                             5.4. Smart Pointers


                                                   shared ptr 2




                                    shared ptr 1   shared ptr 3




                                                   shared ptr 0



While shared pointers are usually quite safe, subtle bugs may be introduced when loops are
created:
   • Such loops occur in ring buffers, doubly linked lists, and similar constructs
   • If all external shared pointers go out of scope, the loop becomes unreachable
   • But the control blocks keep the loop alive, because there is always at least one remaining
     shared pointer per manager object
   • One can replace one or all of the internal shared pointers by weak pointers to prevent
     this silent memory leak
Another pitfall occurs when an object tries to return a shared pointer to itself, e.g., during the
construction of the loops we just discussed. One might be tempted to simply return a shared
pointer wrapping the this pointer:

return std::shared_ptr<Foo>(this);

However, this would create a separate set of shared pointers, and the two distinct control blocks
would result in calling delete twice. There is actually an official workaround for this: derive
from a special class, and then call a special method.

struct Foo : public std::enable_shared_from_this<Foo> // CRTP
{
  auto get_ptr() {return shared_from_this();}
}


The class enable_shared_from_this works like this:



                                                                                               89
5. C++11 Features


     • The shared pointer constructor is specialized for classes that inherit from this class, and
       places a weak_ptr in the base class object.

     • The method shared_from_this() accesses this weak pointer, locks it, and returns the
       resulting shared pointer.
     • The original shared pointer and the new one use the same manager object, therefore the
       problem is avoided.


Before C++17, calling shared_from_this() on an object that isn’t actually stored in a
shared pointer lead to silent undefined behavior, but since C++17 this case thankfully throws
an appropriate exception.


5.5. Lambda Expressions (Closures)

We have already discussed functors: objects that provide an operator() method, which can
therefore be called like functions. The main benefit of functors over simple function pointers
is their flexibility, based on their constructors and internal attributes one may:
     • provide some form of parameterization (constructor arguments)
     • have internal state / function memory (data members)
     • grant access to local variables (by passing them to the constructor)


C++11 introduced a framework to generate such functors automatically from short descriptions
called lambda expressions. These expressions are treated as if the equivalent functor class had
already been provided, and are therefore a convenient way to provide ad-hoc function definitions
(closures).
A lambda expression consists of the following components:
     • a capture list in brackets, specifying which local variables (if any) should be made available
       within the expression
     • a list of function arguments in parentheses, as usual
     • a trailing return type, but without auto in front
     • a function body in braces, as usual

int a = 4;
int b = 5;

// store function object in local variable
auto f = [a,&b] (int x) -> int {b += a*x; return a-x;};




90
                                                                      5.5. Lambda Expressions (Closures)


std::cout << f(3) << std::endl; // 4 - 3 = 1
std::cout << b << std::endl;    // 5 + 4*3 = 17




The capture list may:

      • be empty, thereby defining an anonymous function that is not a closure (i.e., nothing is
        captured25 )

      • mention the variables that should be captured, with a & in front of those that should
        be captured by reference instead of value

      • start with a single = , followed by zero or more captures by reference

      • start with a single & , followed by zero or more captures by value

In the last two cases, any variables that is used in the expression is automatically captured
by-value (for = ) resp. by-reference (for & ).

int a = 1, b = 2, c = 3;
// captures b by-reference, a and c by-value (if actually used)
auto f = [=,&b] () -> void {...};
// captures a by-value, b and c by-reference (if actually used)
auto g = [&,a] () -> void {...};




Behind the scenes, each lambda expression defines a local functor class, with an automatically
generated, unknown name that is guaranteed to not clash with any names that are actually
present in the code.

      • Such local classes, i.e., classes defined within functions and other classes, are actually a
        general concept in C++. Typically, they are only used to provide private helper classes
        as implementation detail of some other class.

      • Local classes can access the static local variables of the surrounding scope, therefore
        variables marked constexpr and static are always available.

      • Anything else that is needed inside the expression has to be captured explicitly or with
        a capture default ( = or & ).

      • The constructor is auto-generated from the intersection of the capture list and the set of
        variables that are needed for the lambda body, while the body itself and the arguments
        are used for the definition of operator() .
25
     Technically, compile-time constant expressions are available, anyway.




                                                                                                     91
5. C++11 Features


Example from DUNE


Two vector norms, implemented as STL algorithms over some internal container, using lambda
expressions as operations / predicates:

typename Dune::template FieldTraits<E>::real_type one_norm() const
  {
    typedef typename Dune::template FieldTraits<E>::real_type Real;
    return std::accumulate(_container->begin(),_container->end(),Real(0),
                           [](const auto& n, const auto& e) {
                             using std::abs;
                             return n + abs(e);
                           });
  }

     typename Dune::template FieldTraits<E>::real_type infinity_norm() const
     {
       if (_container->size() == 0)
         return 0;
       using std::abs;
       typedef typename Dune::template FieldTraits<E>::real_type Real;
       return abs(*std::max_element(_container->begin(),_container->end(),
                                    [](const auto& a, const auto& b) {
                                      using std::abs;
                                      return abs(a) < abs(b);
                                    }));
     }




5.6. Variadic Templates

C++11 introduced template parameter packs, which are placeholders for an arbitrary number of
template parameters (type parameters, non-type parameters, or template template parameters,
but not mixed). In the case of type template parameters, these may be accompanied by a
function parameter pack containing the corresponding values.


Parameter packs may be used to create templates that accept an arbitrary number of template
parameters, so-called variadic templates:

// arbitrary number of int parameters
template<int... args>
  class Foo { ... };

// arbitrary number of type parameters, plus arguments
template<typename... Ts>
  void bar (Ts... args)
  { ... };



92
                                                                                5.6. Variadic Templates


We have already discussed such a variadic template, namely std::tuple .

The code of a variadic function template has to be written in a way that it treats a vari-
able number of parameters, and the standard way of achieving this is via specifying the first
parameter (the “head”) separately, and then using recursion.


The remaining list of parameters / arguments (the “tail”) can be used within the function
body: one replaces

   • the parameter list typename... Ts with Ts... and

   • the argument list Ts... args with args... .

The ellipsis ... may also be placed after any expression containing these packs, instead of
directly behind them.


Every expression containing at least one such pack is subject to pack expansion: at instanti-
ation, the expression is replaced by a comma-separated list of copies of itself, with the pack
replaced by each one of its constituents in turn.

An implementation of Horner’s method, efficiently evaluating a given polynomial:

                   a0 + a1 · x + a2 · x2 + · · · = a0 + x · (a1 + x · (a2 + x · · · · ))



// this is an overload, not a specialization!
template<typename T>
  double eval_poly(double x, T head)
  {
    return head;
  }

// general case: used when the pack is non-empty
template<typename T, typename... Ts>
  double eval_poly(double x, T head, Ts... tail)
  {
    // pack expansion: insert remaining coefficients
    return head + x * eval_poly(x,tail...);
  }

  // evaluates f(x) = 3 + 5x + 2x^2
  eval_poly(x,3,5,2);




For the sake of demonstrating a bit of the power of pack expansion, we may choose to multiply
each of the remaining coefficients with x instead. This makes the algorithm significantly less
efficient, of course: it implements the naive way of evaluating polynomials, after all.




                                                                                                    93
5. C++11 Features


template<typename T>
  double eval_poly(double x, T head)
  {
    return head;
  }

template<typename T, typename... Ts>
  double eval_poly(double x, T head, Ts... tail)
  {
    // pack expansion: remaining coefficients times x
    return head + eval_poly(x, x * tail...);
  }



One can even expand several packs in the same expression, leading to quite powerful constructs:


template<typename... Ts1>
  struct Left
  {
    template<typename... Ts2>
      struct Right
      {
        using Zip = std::tuple<std::pair<Ts1,Ts2>...>;
      };
  };

     // defines the pairing [(A,X),(B,Y),(C,Z)]
     using Zip = Left<A,B,C>::Right<X,Y,Z>::Zip;




5.7. Concurrency with Threads

Concurrency and Parallel Computing

Moore’s Law states that the number of transistors in integrated circuits doubles roughly every
two years (i.e., exponential growth). This typically translates to a similar increase in compu-
tational power and available memory. However, there are clear physical limits to this growth
(atom size, speed of light).

The computational power of single CPU cores has been stagnant for some years, and instead
multi-core systems have become dominant: most modern computers have CPUs with multiple



94
                                                               5.7. Concurrency with Threads


cores, not just compute clusters, but also personal laptops and desktops, and even smart
phones.


As a consequence, concurrency and parallel computing are no longer just means to accelerate
computations: nowadays, a solid understanding of parallel programming techniques is impor-
tant to make full use of the resources a computer provides.
There is a wide variety of compute architectures available today. A simplified categorization
might be:
   • Multi-core CPUs with associated main memory on a single machine, every core has full
     access to the system memory (uniform memory architecture, UMA), concurrent program-
     ming using threads (separate execution flows within one program)
   • Clusters, i.e., tightly interconnected computers as a unit, each with its own CPU and
     memory (non-uniform memory architecture, NUMA), parallel programming based on
     message passing to exchange memory contents and computation results
While message passing requires external liberaries, concurrency using threads is provided na-
tively by C++11 and above.


Concurrency with Threads

C++ had thread suppport before C++11, of course, but the availability and characteristics of
these threads where dependent on the architecture and operating system, e.g., POSIX threads
(pthreads) under UNIX and Linux. C++11 provides a unified interface to thread creation and
management, thereby increasing interoperability and platform independence of the resulting
code.


Note that there is typically still the need to choose an appropriate thread backend, e.g., using
the -pthread compiler flag to use POSIX threads.


There are two different ways of writing such concurrent programs:
   • low-level code based on threads, using mutexes and locks to manually manage thread
     interactions and data exchange
   • higher-level code based on tasks, using futures and promises to abstract away most of the
     interdependencies
We have discussed smart pointers as handles for memory. A std::thread object is also a
handle, but one that represents an executing subprogram.
   • The first argument of its constructor is a function pointer, functor, lambda expression
     or similar. This is the function the thread will execute, and may be interpreted as the
     “main function” of the subprogram.



                                                                                             95
5. C++11 Features


     • The remaining arguments are passed to this function using perfect forwarding. There
       may be an arbitrary number of them, because the constructor is a variadic template.
     • Threads can’t be copied, only moved, because there may be only one representation of
       the running subprogram (compare unique_ptr ).

     • Calling the detach() method releases the thread, letting it run indendently from the
       main program, while calling join() halts the main program until the thread is done
       and ready to be assimilated.
     • The destructor aborts the program via terminate() if neither join() nor detach()
       have been called beforehand.
Thread-based programs have a special thread-local namespace called std::this_thread , which
provides a number of functions for thread control:
     • The function get_id() returns a number that identifies the thread itself. The thread
       handle std::thread has a method get_id with the same information, but this is
       inaccessible from within the running thread, of course.
     • The function yield() signals the operating system and suggests that other threads
       should be allowed to run, e.g., because the current thread would be mostly idle anyway.
     • The functions sleep_for and sleep_until can be used to let the thread wait for
       some time without resorting to resource-intensive busy-waiting.
The two sleep functions expect a duration resp. time point from the C++11 → chrono date
and time library.

int val = 0;

// lambda as first argument, no explicit function arguments
// val captured by-reference through the lambda
std::thread t([&](){
    for (int i = 0; i < 10; i++)
    {
      std::cout << ++val << std::endl;
      // blocks thread execution for given duration
      std::this_thread::sleep_for(std::chrono::seconds(1));
    }
  });

// could do something else here in the meantime

// wait for thread to finish execution
t.join();




96
                                                                    5.7. Concurrency with Threads


Mutexes and Locks

If two or more threads share some resources, race conditions may occur: the result of read
and write operations on shared memory may depend on their exact order of execution. A
well-known example is a lost update:
        • Two threads read a value from memory, increment it, and store the result.
        • If these two sets of operations don’t overlap, then everything is fine: the value is incre-
          mented twice.
        • If there is an overlap, then only one of the increments is stored, and the program doesn’t
          produce the expected results.
This form of race condition can be prevented through mutual exclusion (mutexes): only one
thread at a time may enter the critical section where the shared resource is accessed, thereby
eliminating any potential for such overlaps.
Calling the lock() method on a mutex stops execution if another thread has currently locked
the same mutex, while unlock() releases the lock, allowing the next thread to enter the
critical section:

void count(int id, int steps, int& val, std::mutex& mut)
{
  for (int i = 0; i < steps; i++)
  {
    mut.lock();   // ensure exclusive access to val
    std::cout << id << ": " << ++val << std::endl;
    mut.unlock(); // allow other threads to access

        std::this_thread::sleep_for(std::chrono::seconds(1));
    }
}

int val = 0; std::mutex mut;
// pass val and mut as references using perfect forwarding
std::thread t1(count, 0, 10, std::ref(val), std::ref(mut));
std::thread t2(count, 1, 10, std::ref(val), std::ref(mut));
t1.join(); t2.join();




Mutexes aren’t exception-safe, and shouldn’t be used directly. Instead, they are usually
wrapped in a lock_guard object, which locks the mutex in its constructor and unlocks it
in its destructor, even if an exception is triggered:

void count(int id, int steps, int& val, std::mutex& mut)
{
  for (int i = 0; i < steps; i++)
  {
    // additional scope to keep lock_guard lifetime short
    // (should not stay around for the sleep period)
    {
      // constructor locks the mutex
      std::lock_guard<std::mutex> lock(mut);




                                                                                                  97
5. C++11 Features


           std::cout << id << ": " << ++val << std::endl;
         } // destructor unlocks the mutex

         std::this_thread::sleep_for(std::chrono::seconds(1));
     }
}



In short, lock guards use RAII to ensure exception-safe mutual exclusion.

There is a number of related methods and classes:

         • Mutexes have a non-blocking method trylock() : the thread doesn’t wait, instead it
           can do other work and maybe retry later.

         • The class recursive_mutex can lock multiple times, e.g., when the same resource is
           needed simultaneously in a number of related functions / methods.

         • There are timed variants timed_mutex and recursive_timed_mutex , which are able
           to wait for a certain time instead of indefinitely.

         • The class unique_lock is a more flexible replacement for lock_guard .

         • Condition variables ( condition_variable ) can be used to wait on a lock, until another
           thread wakes one or all waiting threads using notify_one() or notify_all() . This
           can be used to create synchronization barriers and organize a structured transfer of data.


Atomics

An alternative to mutexes and locks is the use of atomics. Data types can be made atomic by
wrapping them in a std::atomic<T> , which provides special uninterruptible methods for the
usual operations. This solves the lost update problem, because read—modify—write cycles
are carried out as single operations.


Atomics can utilize highly efficient special CPU instructions on modern architectures, or, if
these aren’t available, fall back to using internal locks. In the latter case they are basically
wrappers that hide the explicit use of mutual exclusion. There are also special memory order
models that can guarantee a certain order of memory operations across different threads. These
are quite powerful, but using them is not as straight-forward as the explicit use of flags and
condition variables.

Examples of atomic operations for std::atomic<T> :

         • Methods load and store for reading and writing. These have the conversion operator
           operator T and assignment operator= as aliases.

         • Methods compare_exchange_weak and compare_exchange_strong , which replace an
           atomic value if it is safe to do so.



98
                                                                  5.7. Concurrency with Threads


      • If T is an arithmetic type, special atomic updates like fetch_add , fetch_sub , etc.,
        which have the aliases operator+= , operator-= , operator++ , and so on.
The methods are named after the special CPU instructions they represent, lending themselves
to a more explicit coding style, while the operator aliases make it possible to write code that
looks more or less like normal C++ code operating on thread-local variables.
The method compare_exchange_weak has the following signature:

bool replaced = atom_v.compare_exchange_weak(expected, desired);


If atom_v contains expected , it is replaced by desired , else expected is replaced by
the value stored in atom_v . This can be used in a loop, checking the value until the update
succeeds:

while (!atom_v.compare_exchange_weak(e,d))
{
  // react to fact that atom_v actually contains value in e
  ...
}

The method compare_exchange_strong works the same, and additionally guarantees that
updates are only performed if actually needed, while the weak version may cause spurious
updates26 .


Concurrency with Tasks

Many programming tasks allow a certain amount of concurrency on two different levels: fine-
grained interaction, and the separate handling of subtasks.


Explicit use of threads is often a good choice when:
      • there is extensive communication and exchange of data between the threads
      • the threads operate collaboratively, maybe even executing the same code on different sets
        of data, exchanging intermediate results
Task-based concurrency is an additional layer of abstraction for cases where:
      • the program can be decomposed into more-or-less independent sub-tasks
      • each such task has well-defined sets of prerequisites it needs and results it will produce
Of course, threads and tasks may also be combined, resulting in two simultaneous levels of
concurrency.
26
     See https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange




                                                                                                99
5. C++11 Features


                                   start      A



                                                  B           C



                                                              D




                                                  E           F



                                    end      G


      • Most scientific programs have setup (A) and output/cleanup (G) phases which have to
        be executed in sequence.
      • Between these two phases, a certain number of tasks, here (B) to (F), have to be per-
        formed. Some of these have to be performed in order, but others may allow a certain
        degree of flexibility and parallel computation.
      • Tasks may be algorithms or sub-algorithms, like some matrix factorization, or the solution
        of some nonlinear system, but also, e.g., background I/O operations.


Promises and Futures

At the heart of task-based programming are promise—future pairs: a promise represents the
result some task will ultimately produce, while a future represents, well, the future inputs
another task is waiting for. As such, promises and futures serve a similar purpose to pipes and
data streams: unidirectional transport of data.


A std::promise<T> object has the following functionality:

      • Create an associated future with get_future

      • Store a result through method set_value , signaling task completion

      • Store an exception instead using set_exception , signaling task failure
      • Two methods that set a result resp. an exception at thread exit instead, so that local
        cleanup can be taken care of
A std::future<T> object provides the following:
      • Methods wait , wait_for , and wait_until , which block execution and wait for the
        result, either indefinitely or for a certain time



100
                                                                           5.7. Concurrency with Threads


      • A method valid , as a non-blocking way of checking for task completion
      • A method get , which returns the stored value if the future is ready, or throws the stored
        exception if one was set
      • A method share , moving the internal shared state to a shared_future


If T is not a reference type, then get returns the stored value as an rvalue reference. Con-
sequently, the method get may be called exactly once in this case27 .


The class std::shared_future can be used to provide the stored value to multiple threads.
In constrast to a normal future, its get method returns const T& and keeps the internal
state valid.
A matrix-vector product, handed over to a new thread:

void matvec (const Matrix& mat, const Vector& vec,
             std::promise<Vector> promise)
{
  Vector res = mat * vec;
  promise.set_value(std::move(res));
}

std::promise<Vector> promise;
std::future<Vector> future = promise.get_future();
// hand over matrix-vector pair, promise must be moved
std::thread t(matvec,mat,vec,std::move(promise));

// could do something else here

future.wait(); t.join(); // wait, then clean up
Vector res = future.get();




Packaged Tasks

A packaged task ( std::packaged_task ) is a convenient wrapper around functions and function-
like objects, like functors and lambda expressions. Calling the packaged task as if it were a
function executes its content concurrently, including low-level tasks like thread creation and
destruction. The function return value is provided using a promise—future pair.

27
     This makes it possible to return large objects from tasks using move semantics.




                                                                                                    101
5. C++11 Features


// packaged task with trivial job, defined through a lambda
std::packaged_task<int(int,int)>
        task([](int a, int b){return a*b;});
// future object that will contain return value
std::future<int> future = task.get_future();
// execute task, i.e., run function
task(3,4);

// could do something else here

future.wait();          // wait for task to finish
int res = future.get(); // extract result




Asynchronous Execution

Packaged tasks may manage their own threads, or have a thread explicitly assigned to them,
but they still have to be started manually. There is an additional construct that is even more
abstract and high-level: std::async , a function providing asynchronous execution. The first
argument of this function is an execution policy:

std::launch::async : launch a new thread that executes the task, similar to packaged_task

std::launch::deferred : execute the task on the current thread, but defer it until the result
    is actually needed for the first time (lazy evaluation)

default (if omitted): bitwise OR of async and deferred , which enables the async call to
     choose between the two options, i.e., start a new thread if that is convenient at some
     point in time, or wait until the result is needed

The std::async function returns a future that will contain the result:

std::future<Foo> future = std::async(expensiveFunction,args);

// do something else here

// wait for thread if one was spawned (asynchronous)
// or compute result here if none was created (deferred)
Foo result = future.get();


Futures from std::async are special: their destructor blocks and forces the contained func-
tion to run even if get was never called. This means that functions for side effects (e.g.,



102
                                                          5.8. Assorted New Features and Libraries


background I/O) are guaranteed to actually run, and to be finished when the future is de-
stroyed, with the async trying to run the function at a suitable moment28 .


5.8. Assorted New Features and Libraries

Types of Initialization

Before C++11, C++ provided several different initialization mechanisms with somewhat in-
consistent notation:
Value: written T(); , default constructor for nameless objects, componentwise value init. for
     array types, zero-initialization otherwise
Direct: written T t(args...); , constructor call for classes, conversion operator or narrowing
     conversion otherwise
Copy: written T t = other; or T t(other); , copy constructor, etc.

Aggregate: written T t = {args...}; , direct assignment of array entries, or of struct com-
     ponents if struct is simple enough29
Reference: written T& ref = t; , creation of reference

Importantly, the default constructor call is T t; , not T t(); , and using this expression for
built-in types leaves them uninitialized!
These forms of initialization remain valid in C++11 and above, but there is also a new mech-
anism that has a more consistent notation and is more flexible: initialization using braced lists
 T t{args...}; . The braces can contain zero, one or more than one argument.

Zero arguments:
Value initialization, using the default constructor for class types. Consistent notation for all
types, and fundamental types are zero-initialized.

One Argument, same or derived type:
Direct initialization, possibly with slicing. No narrowing conversions allowed for fundamental
types except for literals that are representable, i.e., no silent conversion from double to int
or similar.

A different argument, or more than one:
Aggregate initialization for arrays and simple structs, else forwarded to a constructor tak-
ing a std::initializer_list<T2> if available, e.g., to mimic aggregate initialization, else
forwarded to a matching constructor. No narrowing conversions allowed for arguments.
28
     A void future std::future<void> can be used in this case.
29
     See https://en.cppreference.com/w/cpp/language/aggregate_initialization for exact specification.




                                                                                                    103
5. C++11 Features


int a; // usual notation leaves a uninitialized
int b(); // zero-init., doesn't work with classes
int c{}; // zero-init., notation consistent with classes

std::vector<int> v(5,2); // vector [2,2,2,2,2]
std::vector<int> w{5,2}; // vector [5,2] (initializer_list)




Range-Based Loops

C++11 introduced range-based for loops as they exist in other high-level languages:

std::array<int,5> arr = {1,2,4,8,16};
for (const int& m : arr)
  std::cout << m << std::endl;


This is a more concise replacement for, and in fact built upon, the for loop using iterators:


for (typename std::array<int,5>::const_iterator it = arr.begin();
        it != arr.end(); ++it)
  std::cout << *it << std::endl;



Range-based for loops can also be used with user-defined classes. One just has to define
three things:
      • An iterator type, the one that would be used in the old-fashioned loop
      • A begin method / free function for the start of the range

      • An end method / free function denoting the end of the range
This means that anything that supports loops based on iterators also works with range-based
loops.

One can implement the usual begin / end container methods, or use the free functions
instead, e.g., if the class can’t be modified for some reason. The iterator type has to support the
expected operations, of course: pre-increment, dereference operator * , not-equal-to operator
 != , and a public destructor, just like for the explicitly iterator-based loop.
The type definition in range-based for loops is an ideal candidate for the auto keyword,
since the type of the elements is usually clear from context. This leads to two standard versions
of such loops, one for const and one for non-const access:



104
                                                         5.8. Assorted New Features and Libraries


// const version
for (const auto& element : range)
{ ... }

// non-const version
for (auto&& element : range)
{ ... }

As discussed, this auto&& is a forwarding reference: it decays to a normal lvalue reference
when accessing the contents of some container, but also accepts values that are generated
on-the-fly, e.g., through some generator function.
A braced initialization list can be assigned to std::initializer_list<T> if and only if all
of its entries have the same type (which is then used as T ), and such an initializer list can
serve as a range. In fact, iterating over such a list is how the corresponding constructors are
implemented.

This means we can use braced lists not just for initialization, but also for range-based loops:


// iterate over some powers of two
for (const auto& i : {1,2,4,8,16})
{ ... }

// iterate over the vowels
for (const auto& c : {'a','e','i','o','u'})
{ ... }




Type Aliases and Alias Templates

In C++11 and above, the keyword using provides an alternative to typedefs, which is more
general and can also be used to give names to partial template specializations / instantiations.


// type alias instead of classical typedef
using Number = int; // replaces ``typedef int Number''

// partial specialization or instantiation
template<typename T>
  using Vector3D = Vector<T,3>; // defines ``Vector3D<T>''

// alias template
template<typename U, typename V>
  using Matrix = Long::Namespace::SpecialMatrix<U,V>;




                                                                                             105
5. C++11 Features


// variadic alias template
// (can also be used to just omit the arguments!)
template<typename... T>
  using Foo = Bar<T...>; // Foo is alias of Bar




Of course, simply importing names like in using std::cout; or similar is still possible as
well.


Initializers and Inheriting Constructors

C++11 simplifies writing classes:
      • Initializers may now also be used for normal (i.e., non- static ) attributes. The assigned
        value is used if the constructor doesn’t explicitly define the member.
      • The constructors of base classes may now be imported with a using directive, just
        like one could import normal methods before. This automatically produces forwarding
        constructors for the derived class.

struct Base {Base(int a, double b, bool c);};

struct Derived : public Base
{
  int i = 0; // used if constructor doesn't assign anything

     // defines Derived(int,double,bool) and potentially others
     using Base::Base;
};




Scoped Enums

Classic C enums are simply names for integer constants. This means they can be used where
      • some simple integer expression or
      • another completely unrelated enum
is expected, potentially leading to subtle bugs.


C++11 introduced scoped enums, which are type-safe and can only interact with enums of the
same type:




106
                                                    5.8. Assorted New Features and Libraries


enum OldColor {RED, BLUE, GREEN}; // old C-style enum
int i = RED; // perfectly fine assignment

enum class Color {red, blue, green}; // scoped enum
Color col = Color::red; // no implicit conversion to int
using Color::green; // local shorthands may be defined



Exception-Free Functions

Previous versions of C++ included a facility for specifying what kind of exceptions a func-
tion might throw. This was deemed too complicated and error-prone, and therefore C++11
switched to specifying that a function will never throw exceptions, used for optimization pur-
poses. There are two versions:
   • A simple noexcept qualifier, stating that exceptions are never thrown

   • A noexcept qualifier with Boolean argument, typically using an operator with the same
     name

// never throws an exception
int five() noexcept {return 5;}

// doesn't throw if both addition and scaling of type T don't
template<typename T>
T scale_and_add(const T& t1, double a, const T& t2)
        noexcept(noexcept(a * t1) && noexcept(t1 + t2))
{
  return a * t1 + t2;
}



Type Traits

As seen in our discussion of template metaprogramming and SFINAE, it can be quite handy
to have certain properties of data types available at compile time.

C++11 and above provide a set of such type traits in header <type_traits> :

   • Type properties (e.g., is_integral , is_floating_point , is_class , is_const ,. . . )

   • Type relationships (e.g., is_same , is_convertible )
   • Type modifications (e.g., remove_const , remove_pointer )

   • Transformations (e.g., enable_if , conditional , common_type )



                                                                                          107
5. C++11 Features


      • Constants ( integral_constant , true_type , false_type )

These have been defined using the same template tricks we have discussed.


For most of these traits, C++17 introduces an equivalent → variable template, for example
is_integral_v , and a mayor use case are → concepts as introduced by C++20.


Time Measurements

The chrono date and time library is a replacement for the old C-style date and time utilities,
with proper type support and automatic unit conversions. The chrono library has its own
namespace std::chrono , where it provides:

      • A system_clock class for wall time: equivalent to C-style time, usually Unix Time
        (UTC), can be nonmonotonic due to, e.g., NTP synchronization.

      • A steady_clock class for time measurements: monotonic (time always moves forward),
        e.g., time since last system reboot.

      • A time_point class template, representing time points for each of the clocks (template
        parameter).

      • A duration class template, representing durations (time point differences).


Time measurements should always use steady_clock , because the system clock may perform
jumps backward in time (e.g., during synchronization) or into the future (e.g., when waking
from suspend state).

Time measurements using the library tend to be a bit verbose, because durations have to be
explicitly cast to time units:

auto start = std::chrono::steady_clock::now();

// perform some computations, measure elapsed time

auto end = std::chrono::steady_clock::now();
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>
          (end - start).count();
std::cout << "duration in ms: " << ms << std::endl;
std::cout << "duration in s: " << ms/1000. << std::endl;


The cast converts the internal unit of the clock (ticks) to one of the usual units, e.g., milliseconds
or seconds. The units provided by C++11 range from nanoseconds to hours, while days, weeks,
months and years are introduced by C++20.



108
                                                    5.8. Assorted New Features and Libraries


Random Number Generation



The legacy C pseudo-random number generator (PRNG) rand() can be of rather poor quality.
Before C++11, this meant that people had to either make do with subpar random numbers,
or rely on some external library. C++11 and above provide a set of very well-known PRNGs
in header <random> , e.g.:

   • The Mersenne Twister (Matsumoto and Nishimura)

   • “Minimal standard” linear congrual engine (Park et al.)

   • RANLUX generator (Lüscher and James)



The generation of random numbers is divided into three distinct steps:

   • Seeding the aforementioned engines, using true random numbers (entropy) or simply
     system time if the latter is sufficient

   • Generation of uniformly distributed pseudo-random integers using the engines

   • Transformation of these values into discrete or continuous target distributions

std::random_device rd;             // entropy source
std::mt19937 gen{rd()};            // seed Mersenne Twister
std::normal_distribution<> dist{}; // standard normal dist

std::vector<double> vals(1000);
for (auto& e : vals)
  e = dist(gen); // draw value using generator


Predefined distributions include:

   • uniform distributions (integer and real)

   • Bernoulli distributions (Bernoulli, binomial, etc.)

   • Poisson distributions (Poisson, exponential, Gamma, etc.)

   • normal distributions (normal, lognormal, chi-squared, etc.)



                                                                                        109
5. C++11 Features


Custom Literals



C++ provides a large number of predefined literals, e.g.:

integers binary (prefix 0b ), octal (prefix 0 ), decimal or hex (prefix 0x ), postfix u for
     unsigned, l for long, ll for long long

floating-point number followed by optional exponent separated with e , postfix f for float,
       l for long, double if omitted

auto a = 0b100101 // binary encoded integer
auto b = 123u     // unsigned int
auto c = 0xCAFEul // unsigned long, hex encoded

auto x = 3.141f         // single-precision float
auto y = 1e10           // double-precision with exponent


All letters in these literals are case insensitive. There are also literals for different types of
characters and strings (UTF-8, UTF-16, . . . ).

Since C++11, custom literals can be defined, based on two ingredients:

      • A user-defined suffix, which has to start with an underscore

      • A literal operator for this suffix, defining how a given numerical value or character se-
        quence should be interpreted

// compute in meters, allow specifications in foot and inch
constexpr long double operator"" _inch (long double x)
{return 0.0254 * x;}

constexpr long double operator"" _ft (long double x)
{return 0.3048 * x;}

// compute in radian, allow specifications in degrees
constexpr long double operator"" _degs (long double x)
{return M_PIl / 180 * x;} // long double pi constant

auto x = 47_inch; auto y = 3_ft; auto z = 90_degs;




110
                                                      5.8. Assorted New Features and Libraries


Raw Literals and Regular Expressions

A bit less relevant for scientific computing, but still quite useful, are the new text manipulation
capabilities introduced by C++11. Raw string literals can be used to specify arbitrary texts
without the need to escape any characters, like quotation marks or newline characters:

// delimiter "foo" is arbitrary, needed to allow ")" in text
// can use any symbol instead, e.g., "|", "!", or "@"
auto raw_string = R"foo(This is a
multi-line string
containing "quotation marks"
and \literal\ backslashes)foo";

Such raw literals can be particularly useful when writing regular expressions in one of the
pattern matching grammars provided by the regex library30 .




30
     See https://en.cppreference.com/w/cpp/regex.




                                                                                               111
6. C++14 Features


6. C++14 Features

C++14 was a comparatively minor update, but it is nevertheless brought several improvements
to the language, mainly regarding constructs introduced in C++11:

      • improvements to constant expressions

      • generic lambda expressions

      • variable templates

      • return type deduction

      • assorted other improvements and additions


6.1. Improvements to Constant Expressions

Except for possibly some checks ( static_assert ) and local definitions ( using / typedef ),
C++11 constexpr functions are basically a single return statement.


C++14 lifts many restrictions and consequently makes constant expressions much more pow-
erful. The body of a constexpr function may now contain normal C++ code, as long as:

      • Every variable has literal type, i.e., it is either built-in or has a matching constexpr
        constructor and destructor.

      • Every such variable is immediately initialized.

      • All functions that are called are themselves marked constexpr .

      • The code doesn’t contain try / catch blocks (exception handling).

The second and fourth restriction are lifted by C++20, where constexpr exceptions are
introduced.

Most importantly, constexpr functions may now contain branches and loops:

constexpr int factorial(int n)
{
  int out = 1;
  for (int i = 2; i <= n; i++)
    out *= i;
  return out;
}



112
                                                          6.2. Generic Lambda Expressions


Just like liberal use of the const keyword prevents bugs and helps the compiler during
optimization, large parts of programs may now be marked constexpr . These parts will
then be evaluated at compile-time where appropriate.


This means we can write whole subprograms that are run at compile-time, instead of relying
on complex template metaprograms for such tasks.


6.2. Generic Lambda Expressions

C++14 introduced generic lambda expressions: lambda function arguments may now be de-
clared auto , as in:

auto add = [] (auto x, auto y) {return x+y;};


The anonymous functor introduced by this lambda expression doesn’t provide a single function
operator() , instead it defines an appropriate operator() function template:

   • Each occuring auto introduces an (anonymous) template parameter, say, T1 , T2 , . . .

   • These parameters are independent: there is no mechanism to specify that two types are
     the same or have some other relationship

   • Any time the lambda is called with a certain argument type combination for the first
     time, the operator() function template is instantiated with these types.


6.3. Variable Templates

Before C++14, templates came in two different flavors, function templates introducing a pa-
rameterized family of functions, and class templates introducing a parameterized family of
classes or structs.


C++14 introduced variable templates, i.e., variables that may have several variants existing
at the same time, parameterized by some types or values. Variable templates can have the
same template parameters as other templates, including template parameter packs, but they
themselves can’t be used as template template parameters.

// no longer a struct with internal enum or similar
template<int N>
  constexpr int factorial = factorial<N-1> * N;

template<>
  constexpr int factorial<0> = 1;



                                                                                        113
6. C++14 Features




Potential use cases of variable templates:

      • Cleaner representation of template metaprogramming, as seen in the example.

      • Provide different versions of some generator object, e.g., for pseudo-random numbers,
        one for each data type.

      • Make named constants (e.g., from C enums and macros) type-aware and usable as tem-
        plates.

The Message Passing Interface (MPI) specifies data types through certain integer constants,
make those available for the C++ type system:

template<typename T>        constexpr MPI_Datatype MPIType;
template<> constexpr        MPI_Datatype MPIType<int>    = MPI_INT;
template<> constexpr        MPI_Datatype MPIType<float> = MPI_FLOAT;
template<> constexpr        MPI_Datatype MPIType<double> = MPI_DOUBLE;
...

This way, generic code can pick the right MPI data type automatically.


6.4. Return Type Deduction

In many cases the trailing return type of lambda expressions is optional, and can be deduced
from the return statement instead:

auto f = [] (int i) {return 2*i;}; // returns int


C++14 extends this mechanism from lambdas to normal functions:

      • The return type of lambdas may be omitted, and the return type of normal functions
        can be “ auto ” without trailing return type.

      • In both cases the return type is deduced using the return statement. Multiple return
        statements with conflicting types cause a compilation error.

      • The deduced type is always a value type, like with auto in general. Reference types
        can be specified using auto& , const auto& , etc. (as a trailing return type in the case
        of lambdas).

This leads to two convenient ways to write functions returning tuples or structs. We don’t
have to repeat the tuple information several times, doing so once in the return type or return
statement is enough:




114
                                                     6.5. Other Improvements and Additions


// type deduction: don't specify return type,
// has to be fixed somewhere in the function body
auto createTuple()
{
  return std::tuple<int,double,bool>{7,3.2,true};
}

// new init version: specify return type,
// can return simple braced list
std::tuple<double,char,int> buildTuple()
{
  return {5.6,'c',27};
}




6.5. Other Improvements and Additions

Deprecated Attribute

Sometimes legacy code has to be kept around for backwards compatibility, but shouldn’t
be used anymore for various reasons. Such code can be marked with the [[deprecated]]
attribute, optionally including some reason:

[[deprecated("Use more efficient newFoo() function instead.")]]
void foo() { ... }

The attribute is placed:

   • before function declarations / definitions (see above)

   • before the class or struct name: struct [[deprecated]] Foo { ... };

   • before variable declarations: [[deprecated]] int i;

Templates, individual template specializations, and whole namespaces can be marked as dep-
recated in similar fashion.


Type Traits: Helper Types

Template metaprograms based on type traits can become quite verbose: each such traits class
introduces a dependent type named type or a Boolean value named value , and these have
to be written out to specify the resulting type or value.



                                                                                       115
6. C++14 Features


C++14 introduced helper types like enable_if_t :

template< bool B, class T = void>
  using enable_if_t = typename enable_if<B,T>::type;

This can help make SFINAE easier to read, for example. Additionally, there is now a method
constexpr operator() for the Boolean values:

std::enable_if<typename std::is_same<T1,T2>::value>::type>
std::enable_if_t<std::is_same<T1,T2>()>> // C++14 version

They could also have introduced helper variables for the Boolean values using variable tem-
plates, but that didn’t happen until C++17.




116
7. C++17 Features

C++17 is again a rather small update. The following points are noteworthy additions to the
language:
   • Guaranteed copy elision
   • Compile-time branches
   • Structured bindings
   • Additional utility classes
   • Fold expressions
   • Class template argument deduction
   • Mathematical special functions
   • The filesystems library


7.1. Guaranteed Copy Elision

In C++, exchanging data with a function through arguments and return values typically
requires copies:
   • copies of the arguments are created in the scope of the function that is called
   • a copy of the return value is created in the scope of the calling function (a nameless
     temporary), and then a second copy is necessary during assignment
Such copies can be memory-intensive and time-consuming, and therefore we often avoid them
using references:
   • Arguments are passed as const references to avoid copies.
   • Return values are replaced by non- const references as arguments: instead of returning
     something, the function simply modifies its argument.

// computes out = mat * vec
void foo(const Mat& mat, const Vec& vec, Vec& out)
{ ... }



The modification of reference-type arguments is efficient, but it makes it harder to see intent
and parse the signature: the return values are effectively hidden among the arguments.


Before C++17, the compiler was allowed to eliminate copies that happened when function
values were returned, but there was no guarantee for this to happen. This was one of the main



                                                                                           117
7. C++17 Features


reasons for using reference-type arguments to communicate results: there is no need to trust
the optimization phase of the compiler. There were two different kinds of possible optimization,
called NRVO and RVO.


Named Return Value Optimization (NRVO):

This is the elimination of copies when a named, local object is returned by the function. In
cases where this is possible, the compiler removes any copy and move operations and instead
constructs the object directly into the storage where it would end up after copying or moving
anyway.


Temporaries and Return Value Optimization (RVO):

This is the elimination of copies when a nameless temporary is used in an assignment. The
creation of the temporary is essentially skipped, and the storage of the assignee is used instead.
The compiler may perform this optimization in several different contexts, and its application
to unnamed return values is called RVO.


To summarize, returning a local variable or temporary and assigning its value to another
variable caused up to two copies before C++17, which could be optimized away under certain
circumstances:

named return value: NRVO for the return value, then elimination of the temporary that is
    used in the assignment

nameless return value: elimination of the two temporaries inside and outside of the function
    (with the former being RVO)

The C++17 standard contains a fundamental change with significant consequences, that is at
the same time almost invisible:

      • C++17 introduced the notion of temporaries as values that only exist within the technical
        specification, and are not actually represented in memory.

      • As a consequence, the temporaries we discussed are never created, and there is no need
        to optimize them away. This is called guaranteed copy elision.

      • This eliminates both copies for unnamed return values, and at least one of those for
        named return values.


In C++17 and above, using return values should almost always be preferred over modifying
reference arguments. This holds even for multiple return values, thanks to → structured bind-
ings.



118
                                                                7.2. Compile-Time Branches


7.2. Compile-Time Branches

C++17 introduced compile-time branches using if constexpr :

   • The branch condition is evaluated at compile-time when the constexpr keyword is
     used. Therefore it must also be constexpr , of course.
   • The non-matching branch is discarded (if present), and will not be compiled. It will not
     cause any compilation errors, even if its content doesn’t make sense for a given set of
     template parameters.
   • Non-matching branches don’t take part in return type deduction, i.e., different branches
     may return different types.

// C++17 Boolean helper variable template
if constexpr (std::is_same_v<T1,T2>)
{ ... } // code for case where types coincide
else
{ ... } // code for when T1 and T2 differ



Compile-time branches can replace template specializations and function template overloads,
and in particular:
   • create recursive template metaprograms without base case specializations
   • eliminate a subset of SFINAE applications, providing a much more readable replacement
// replacement for SFINAE:
// * select function body based on type T
// * deduce return type from chosen branch

template<typename T>
auto foo(const T& t)
{
  // integer case
  if constexpr (std::is_integral_v<T>)
  { ... }
  // floating point case
  else if constexpr (std::is_integral_v<T>)
  { ... }
  // default case
  else
    // have to make the assert depend on T
    // here, else it would fire in any case
    static_assert(not std::is_same_v<T,T>,
      "only integers and floating point!");
}




An implementation of nested loops of depth N , with N being a compile-time constant, based
on if constexpr :



                                                                                         119
7. C++17 Features


template<std::size_t N, typename Callable, std::size_t K = N-1>
  void metaFor(std::array<size_t,N>& indices,
               const std::array<size_t,N>& bounds, Callable&& callable)
  {
    static_assert(K < N, "K must be smaller than N");
    if constexpr (K == 0)
      for (std::size_t i = 0; i < bounds[0]; i++)
      {
        indices[0] = i;
        callable(indices);
      }
    else
      for (std::size_t i = 0; i < bounds[K]; i++)
      {
        indices[K] = i;
        metaFor<N,Callable,K-1>(indices,bounds,
                                std::forward<Callable&&>(callable));
      }
  }




7.3. Structured Bindings

C++17 added two new constructs, structured bindings and a version of if with initializer,
similar to that of for loops.


Structured bindings make multiple return values possible by assigning names to components of
an object:

auto [x,y,z] = f(); // f returns object with three components


This works with C arrays, C-style structs, std::array , std::pair and std::tuple . x ,
y and z are then references of the entries / data members.


Such bindings can be convenient in range-based for loops over maps:

for (auto&& [first,second] : map)
{...}



The if with initializer works similar to a for loop:

if (auto [x,y,z] = f(); x.isValid())
{...}
else
{...}



120
                                                                               7.4. Utility Classes


This avoids polluting the surrounding scope, just like the first entry of the for loop declara-
tion.


C++20 introduces a similar initializer for range-based for loops. It may be used to, e.g.,
initialize the range of the loop, or provide a local index counter variable.


7.4. Utility Classes

The two Standard Library class templates std::pair and std::tuple are now accompanied
by std::optional , std::variant , and std::any .

std::optional<T> :

A type-safe wrapper that may or may not contain a value of type T . Follows value semantics,
but access uses * and -> operators anyways.


std::variant<T...> :
A type-safe replacement for union . Provides several safe ways to check / access the currently
held value, e.g., std::get like std::tuple .


std::any :

An implementation of type erasure, std::any can store objects of arbitrary types. Basically
a type-safe, resource owning replacement for void* .

Comparison of Standard Library utility classes:

 Class                 Semantics    Copies         Validity       Types
 std::tuple            value        copyable       always valid   fixed set
 std::shared_ptr       reference    ref.-counted   nullable       fixed type
 std::unique_ptr       reference    move only      nullable       fixed type
 std::optional         value        copyable       nullable       fixed type
 std::variant          value        copyable       always valid   fixed set
 std::any              value        copyable       always valid   arbitrary


   • All the classes own and manage their data
   • Two vector replacements (shared vs. non-shared)
   • Two aggregate types (Cartesian product vs. union set)
   • Two relaxations on the requirements on C++ variables (optional value vs. arbitrary type)



                                                                                               121
7. C++17 Features


7.5. Fold Expressions

C++17 introduces fold expressions, which can automatically expand a parameter pack (the
argument with ellipsis in variadic templates) and apply operators inbetween.

template<typename... Args>
int sum(Args&&... args)
{
  return (args + ... + 0);
}

template<typename... Args>
void print(Args&&... args)
{
  (std::cout << ... << args) << std::endl;
}


This can be used with 32 predefined binary operators, including all arithmetic and logic oper-
ators.
Call a function for each argument, despite not knowing the number of arguments beforehand
(folds using the comma operator):

template<typename Func, typename... Args>
void callForAll(const Func& f, Args&&... args)
{
  ( f(args), ...);
}



Folds can be expanded to the left (right fold, ellipsis on the right) or to the right (left fold,
ellipsis on the left), and each version can have an initial value (binary fold) or not (unary
fold).


This means: The one operation that is written explicitly is actually the last to be executed,
not the first, since the fold is realized using recursion.


7.6. Improvements for Templates

C++17 brings some improvements for templates:
      • class template argument deduction
      • type deduction for non-type template parameters



122
                                                         7.7. Mathematical Special Functions


Before C++17, only the parameters of function templates could be deduced, those of class
templates had to be written out. From C++17 onwards, class template arguments can be
omitted as long as they can be deduced from the constructor function call:

std::pair   p{1,7.3};     // deduces std::pair<int,double>
std::tuple t{true,'c',2}; // deduces std::tuple<bool,char,int>
std::vector v{1,2,3,4};   // deduces std::vector<int>



Quite often, templates require both a type parameter and one or more non-type parameters
of that type. Examples are:
   • Constants for template metaprogramming across different types
   • A type as parameter, with associated default values, initial values, etc.


In C++17 and above, type deduction can be used in this context:

template<typename T, T t> // redundancy
  struct OldConstant {enum {value = t};};

template<auto t>          // type deduction
  struct NewConstant {enum {value = t};};

This makes it possible to omit the redundant type information in instantiations, for example
NewConstant<5> . The type of the parameter is available using decltype , should it be
needed.


7.7. Mathematical Special Functions

C++ provides a number of mathematical functions, which are of course quite relevent in
Scientific Computing:
exponential: exponential function and logarithms, e.g., ln
power: square and cubic root, arbitrary powers
trigonometric: sine, cosine, tangent, etc.
hyperbolic: hyperbolic sine (sinh) and cosine (cosh), etc.
others: e.g., gamma function (generalized factorial)


C++17 added special functions for, e.g., cylindrical and spherical harmonics:
   • Orthogonal polynomial bases (Legendre, Hermite and Laguerre)



                                                                                        123
7. C++17 Features


      • Cylindrical and spherical Bessel functions
      • Cylindrical and spherical Neumann functions
      • Beta function and Riemann zeta function


7.8. Filesystems Library

The filesystems library31 introduced capabilities that C++ was sorely missing, because they
were either nonexistent or non-portable:
      • Representation of directory paths, files, and file types
      • Queries (file existence, file size, permissions, etc.)
      • Handling of directories and files
      • Creation of hard links and symbolic links
      • Access to current directory
      • Special path for temporary files

The library provides a unified framework that is portable across operating systems. Some
filesystems don’t provide certain features, e.g., the old FAT system doesn’t know symbolic
links. In these cases appropriate exceptions are thrown.




31
     See https://en.cppreference.com/w/cpp/filesystem.




124
8. C++20 Features

C++20, the upcoming new standard, is a major update similar to C++11, and will drastically
change the way modern C++ is written. Additions to the language include:

   • Module support

   • Concepts

   • Ranges as entities

   • Coroutines

   • Mathematical constants

   • A modern text formatting framework


8.1. Modules

The traditional C++ compilation process is based on that from C, and inherits the following
drawbacks:

   • Header files are recursively combined into one large text document.

   • Header guards are necessary to prevent redefinitions and conflicts.

   • Preprocessor macros remain active across header boundaries.

   • Consequently, the order of header includes may be important.


C++20 introduces modules, which are separate units that can be used to divide code bases
into logical parts. The result of importing such a module is well-defined, and macro definitions
are kept within their module and are unable to influence the content of other modules.

A module is a source file, just like those including the main function of programs, which
contains a number of export and import statements:

export module MyModule; // module declaration
import <iostream>;      // import statements:
import <vector>;        //        replacement for includes

// not visible outside the module
void myInternalFunction() { ... };

// can be made available by importing the module
export void myFunction() { ... };



                                                                                            125
8. C++20 Features


Modules are a replacement for the practice of using header files to represent libraries, which
then include all the headers the library requires or provides. Namespaces are orthogonal to
modules, and can be used in conjunction with modules or across them.


8.2. Concepts

C++20 introduces concepts, which are named requirements for template parameters. Their
definition is very similar to that of constexpr Boolean variable templates:

template<typename T>
  concept Number = std::is_integral_v<T>
                   || std::is_floating_point_v<T>;


But in contrast to simple variable templates, one can also require that certain methods and
operators exist and produce a certain type:

template<typename T>
  concept EqualityComparable = requires(T a, T b)
  {
    { a == b } -> std::same_as<bool>;
    { a != b } -> std::same_as<bool>;
  };




The main benefit of concepts is the fact that requirements for template parameters can be
checked early, instead of triggering convoluted error messages somewhere within the instanti-
ation process.


Ideally, a C++ concept should express the identity and essence of a certain category of types,
e.g., the mathematical concepts of vectors and matrices, etc. Unfortunately, most such prop-
erties are hard to write down and can’t be checked by the automatic compilation process:

      • existence of neutral elements

      • commutative, associative, and distributive properties

      • ...

This means that one can often only check the interface and existence of operations in practice.

Templates can specify restrictions in several ways:




126
                                                                               8.2. Concepts


// verbose form, can be used for ad-hoc (i.e., unnamed) concepts
template<typename T>
  auto twice(T i) requires std::integral<T>
  {return 2*i;}

// shorter version, concept has to be defined beforehand
template<std::integral T>
  auto square(T i)
  {return i*i;}




The type deduction for function arguments of generic lambda expressions is now also available
for normal functions:

// usual template declaration
template<typename T1, typename T2>
  auto foo(const T1& a, const T2& b);

// equivalent declaration using auto keyword
auto bar(const auto& a, const auto& b);




As with generic lambdas, each use of the auto keyword introduces an independent, new
template parameter. But type relationships between the arguments can also be modelled in
this form: simply specify an appropriate concept based on decltype .

This new way of writing templates can be combined with concepts, and the result is an elegant
and easy to read way of function template selection:

// used for types that represent integers
void foo(const std::integral auto& t)
{ ... }

// used for types that represent real numbers
void foo(const std::floating_point auto& t)
{ ... }

// default: used for any other type
void foo(const auto& t)
{ ... }


This is a good replacement for many SFINAE constructs.



                                                                                         127
8. C++20 Features


8.3. Ranges

A direct application of C++20 concepts is the range concept, which represents sequences.
This includes the usual container classes, but also, e.g., infinite sequences based on generator
functions (using lazy evaluation).


Ranges can be filtered and transformed, producing new ranges in the process:

for (const auto& e : v | std::views::filter(even))
{ ... } // do something for each element that is even


The STL algorithms can now operate on ranges instead of iterator pairs:

// old version, kept for backward compatibility
int n = std::count(v.begin(), v.end(), 7);

// new version based on the range concept
int m = std::ranges::count(v, 5);




8.4. Concurrency with Coroutines

C++ adds coroutines to the language, which are special functions used for cooperative (non-
preemptive) multitasking. Coroutines are subprograms, like normal functions and threads, but
with significant differences:

      • Functions are subprograms that are always executed in one go from start to finish, and
        that don’t store any information between invocations. Exactly one function is executed
        at any given time, and functions waiting for other functions to finish form a call stack,
        with the main function at the bottom.

      • Threads are subprograms that are run independently. They are also executed as a unit,
        but may be interrupted at any time (preemptive multitasking). Execution continues
        where it was interrupted, and synchronization using, e.g, mutexes and locks is necessary
        because there is no well-defined execution order.

Coroutines differ from functions and threads in the following ways:

      • A coroutine may voluntarily yield at any time, handing the execution flow over to another
        coroutine.

      • Coroutine instances can have internal state, similar to functors, and this state is preserved
        when they yield.



128
                                                                           8.4. Concurrency with Coroutines


      • They can have multiple reentry points, i.e., the execution of a coroutine doesn’t have to
        continue at the point where it was stopped.
      • There is no need for synchronization, because coroutines are collaborative and only one
        of them runs at any given time.

The support for coroutines in C++20 isn’t as fleshed out as that for threads: there are no
predefined coroutines yet, only the facilities that are needed for their implementation.
Implementing a coroutine is somewhat similar to writing a functor class or using tasks with
their promise—future pairs, yet significantly more verbose, at least for now. Let’s consider a
small generator semicoroutine32 .

First, we need a promise object, similar to those we know from task-based programming, then
a handle, comparable to a future object, and finally a function definition:
template<typename T>
  struct GeneratorPromise
  {
    using Handle = std::coroutine_handle<GeneratorPromise<T>>;

       auto   initial_suspend()         {return std::suspend_always{};}
       auto   final_suspend()           {return std::suspend_always{};}
       auto   get_return_object()       {return Handle::from_promise(*this);}
       auto   yield_value(const T value){current_value = value; return std::suspend_always{};}
       void   unhandled_exception()     {std::abort();}

       T current_value;
     };




template<typename T>
  struct Generator
  {
    using Handle = std::coroutine_handle<GeneratorPromise<T>>;
    using promise_type = GeneratorPromise<T>;

       // appropriate constructors and destructor
       // to forbid copies but handle moves cleanly
       ...

       T operator()() {return handle.promise().current_value;}
       bool next()    {handle.resume(); return !handle.done();}

       Handle handle;
     };

Generator<int> sequence(int start = 0, int step = 1) noexcept
{
  auto value = start;
  for (int i = 0;; ++i)
  {
    co_yield value;

32
     A semicoroutine is a simple coroutine that can only yield to its caller.




                                                                                                       129
8. C++20 Features


        value += step;
    }
}



        • The generator is used by storing the return value of the function and calling its next()
          method to obtain numbers.

        • The two structs can be reused to define other generators, simply by writing another
          function that also returns a generator object.


8.5. Mathematical Constants

While C++17 added special functions that are often needed in Scientific Computing, C++20
introduces variable templates for mathematical constants. Before C++20, well-known con-
stants like, e.g., π or e are usually available as C-style macros like M_PI . The variable
templates serve as a modern replacement that is part of the standard.


The predefined constants include:

e_v<T> : Euler’s number e ≈ 2.7182918 . . .

pi_v<T> : the circle constant π ≈ 3.1415926 . . .

egamma_v<T> : the Euler-Mascheroni constant γ ≈ 0.57721566 . . .

phi_v<T> : the golden ratio constant φ ≈ 1.6180339 . . .

Additionally,
√             a small number of square roots and logarithms are available, like ln(2) and
  2. Specializations for float , double and long double are provided for each variable
template.


8.6. Text Formatting

Before C++20, there are two separate ways to format text output:

        • The well-known I/O streams. These provide a safe and extensible interface, but format
          strings and arguments are intertwined, which. . .

            – makes automatic generation / modification difficult (e.g., localization)

            – makes code with many formatting commands hard to read

        • The old printf() formated output function from C. While this clearly separates format
          and arguments, this function is neither safe nor extensible and should be avoided.



130
                                                                            8.6. Text Formatting


C++20 introduces modern text formatting capabilities to C++ in the from of its format
library33 :
      • Based on templates and therefore extensible
      • Performant implementation available
      • Format specifiers consistent with those from other languages




33
     https://en.cppreference.com/w/cpp/utility/format/formatter#Standard_format_specification




                                                                                                131
A. Exercises


A. Exercises

This appendix contains the exercises accompanying the lecture. Some of them may reference
“the lecture” or previous “sheets”, but it should be clear from context which section or sub-
section of this document is meant by that. One of the exercises mentions “an implementation
on the website”. This code is not provided within this document, but is just a sample solution
to one of the previous exercises anyway.
Several of the exercises (the quizzes) link to Anders Schau Knatten’s website cppquiz.org. As
with any other hyperlink, the referenced contents may vanish without notice, but hopefully
the links will stay valid for quite some time.


A.1. Fundamentals and Standard Library

C++ Quiz

On https://cppquiz.org you can find a quiz with C++ specific questions. In this exercise,
answer the following questions:
Question 1: https://cppquiz.org/quiz/question/197 (variable lifetime)
Question 2: https://cppquiz.org/quiz/question/161 (Duff’s Device)
Question 3: https://cppquiz.org/quiz/question/9 (reference arguments)
Question 4: https://cppquiz.org/quiz/question/113 (overload resolution)
Question 5: https://cppquiz.org/quiz/question/5 (initialization order)
The questions are sorted (more or less) according to the structure of the lecture. For questions
1, 3, 4, and 5, write a short statement what information the question and its solution are trying
to convey. Regarding question 2: inform yourself about the construct that is used. What is
its purpose? Would you suggest using this in real-world code? Why, or why not?


Rational Numbers

Write a class for rational numbers. The number should always be represented as a fully reduced
fraction of the form
                                          numerator
                                         denominator
with denominator > 0.

  1. What is an appropriate data structure for rational numbers?
  2. Start by writing a function int gcd(int,int) (greatest common divisor), you will
     need it to reduce fractions.
        • You can use the Euclidean algorithm to determine the greatest common divisor.



132
                                                              A.1. Fundamentals and Standard Library


        • For an algorithm see https://en.wikipedia.org/wiki/Greatest_common_divisor
        • Implement this scheme as a recursive function.
  3. Write a class Rational , which represents a rational number. The constructor should
     have the numerator and the denominator as arguments. Be sure to check for valid input.
     In addition, the class has two functions numerator() and denominator() that return
     the values of the numerator and denominator. The class should have three constructors:
        • a default constructor that initializes the fraction with 1,
        • a constructor that initializes the fraction with a given numerator and denominator,
          and
        • a constructor that initializes the fraction with a given whole number.
  4. Supplement the class with operators for *= += -= /= and == .

  5. Use the newly implemented methods to implement free operators * + - / .
  6. Check your implementation using various test cases. Initialize three fractions
                                                3           4            0
                                      f1 = −      ,     f2 = ,       f3 = .
                                               12           3            1

      Test the operators with the following examples:

                     f3 = f1 + f2 ,      f3 = f1 · f2 ,       f3 = 4 + f2 ,f3 = f2 + 5,
                                                                          f1
                                   f3 = 12 · f1 ,     f3 = f1 · 6,    f3 = .
                                                                          f2
      Print the result after each operation. The corresponding solutions are:
                            13        1       16       19       3      3           3
                               ,     − ,         ,        ,    − ,    − ,     −      .
                            12        3       3        3        1      2          16

Farey Sequences

A Farey sequence FN of degree N (or: the Farey fractions of degree N ) is an ordered set of
reduced fractions
                    pi
                           with         p i ≤ qi ≤ N           and      0 ≤ i < |FN |
                    qi
and
                               pi   pj
                                  <                 ∀ 0 ≤ i < j < |FN |.
                               qi   qj

Use the class Rational from the previous exercise to write a function

                                         void Farey(int N)



                                                                                                133
A. Exercises


which calculates the Farey fractions up to degree N and prints the resulting Farey sequences
up to degree N on the screen.
Algorithm: The sequences can be computed recursively. The first sequence is given by

                                     F1 = 10 , 11
                                                  


For a known sequence FN one can get FN +1 by inserting an additional fraction pqii +p i+1
                                                                                   +qi+1 between
                                 p
two consecutive entries pqii and qi+1
                                   i+1
                                       if qi + qi+1 = N + 1 holds for the sum of denominators.
Example: Determining F7 from F6 results in the following construction:
                                                                                          
                       F6 = 01 , 16 , 15 , 14 , 13 , 25 , 12 , 35 , 23 , 34 , 45 , 56 , 11
                              |{z} |{z} | {z } |{z} |{z}
                                                       1              2        3             4       5             6
                                                       7              7        7
                                                                                   and       7       7             7


The new elements are:
                  0+1       1       1+1        2           2+1            3             1+3          4          2+3        5           5+1       6
                  1+6   =   7   ;   4+3    =   7   ;       5+2    =       7    ;        2+5      =   7    ;     3+4    =   7   ;       6+1   =   7

The sorted sequence then is:
                                                                                                                                  
                   F7 = 10 , 71 , 16 , 15 , 14 , 27 , 13 , 52 , 37 , 12 , 47 , 35 , 23 , 57                    3 4 5 6 1
                                                                                                               4, 5, 6, 7, 1


For checking:
The Farey sequences up to degree 6
                                                       0    1
                                                             
                                       F1 =            1,   1
                                                       0    1    1
                                                                  
                                       F2 =            1,   2,   1
                                                       0    1    1    2       1
                                                                               
                                       F3 =            1,   3,   2,   3,      1
                                                       0    1    1    1       2    3    1
                                                                                         
                                       F4 =            1,   4,   3,   2,      3,   4,   1
                                                       0    1    1    1       2    1    3    2    3      4    1
                                                                                                               
                                       F5 =            1,   5,   4,   3,      5,   2,   5,   3,   4,     5,   1
                                                       0    1    1    1       1    2    1    3    2      3    4 5 1
                                                                                                                     
                                       F6 =            1,   6,   5,   4,      3,   5,   2,   5,   3,     4,   5, 6, 1 .

There is a beautiful illustration of these fractions, the Ford circles34 .


Matrix Class Implementation

Note: this exercise will form the foundation for several of the upcoming exercises, so if you
want to work on only a subset of the sheet, it might be a good idea to start on this one.
In Scientific Computing the concept and use of matrices is crucial. Regardless of the field of
expertise — if it is in optimization, statistics, artificial intelligence, or the solution of partial
differential equations — we need matrices and solutions of linear systems of equations in nearly
all applications.
34
     see https://en.wikipedia.org/wiki/Ford_circle




134
                                                     A.1. Fundamentals and Standard Library


In this exercise, we will implement a class Matrix in analogy to the class Vector from
exercise sheet 1.
Take care of the following points:

  1. Class Matrix should have all functionality that class Vector has (constructors, meth-
     ods, etc.).
  2. The entries should be stored in a container of type std::vector<std::vector<double>> .
  3. Instead of the number of elements int N in class Vector , class Matrix should contain
     the two numbers int numRows and int numCols that represent the number of rows
     and the number of columns respectively. Use numRows and numCols in all places where
     the member functions of class Vector used N , and re-implement the functionality
     adapted to the use of a class that represents matrix objects.
  4. Use the member function double& operator()(int i, int j) for accessing the (i, j)-
     th element of objects of class Matrix .

  5. Use the member function std::vector<double>& operator[](int i) to return the
     i-th row of a matrix object.
  6. Class Matrix should have an additional constructor that constructs square matrices.
     In addition to the member functions mentioned above, implement free functions that
     provide
  7. the addition of two matrices,
  8. the multiplication of a matrix with a scalar,
  9. the multiplication of a scalar with a matrix,
 10. a matrix-vector multiplication, where vectors are of type std::vector<double> ,
 11. a matrix-vector multiplication, where vectors are of type Vector .

Write a test program that tests all functionality of class Matrix (construction, the different
kinds of multiplication, element access).


Linked List

Using the simple example of a chained list we will practice the interaction of constructors,
destructors and pointers.
We want to program a linked list which can store an arbitrary number of values of type int .
Such a list consists of an object of class List , which refers to a sequence of objects of class
Node . The list elements are stored in a component int value within each node, and a
pointer Node* next points to the next node. The end of the list is designated by the pointer
next having the value nullptr .



                                                                                            135
A. Exercises


  1. What is special about a pointer having the value nullptr ?

  2. Implement the class Node . Make sure that all member variables are always initialized,
     especially the next pointer.

  3. Implement the class List with the following methods:

      class List
      {
        public:

            List ();                                  //   create an empty list
            ~List ();                                 //   clean up list and all nodes
            Node* first () const;                     //   return pointer to first entry
            Node* next (const Node* n) const;         //   return pointer to node after n
            void append (int i);                      //   append a value to the end
            void insert (Node* n, int i);             //   insert a value before n
            void erase (Node* n);                     //   remove n from the list
      };


       List must also store the beginning of the list, where would you place it in the class decla-
      ration? The next pointer of class Node should be private to ensure that the list struc-
      ture isn’t accidentally changed outside of class List . The member value is public
      to allow read and write access from outside the class. The line friend class List;
      has to be inserted into the declaration of the class Node to give the List class access
      to the next pointer. Additionally make sure that the destructor deletes all allocated
       Node objects.

  4. Test your implementation with the following program:

      int main ()
      {
        List list;
        list.append(2);
        list.append(3);
        list.insert(list.first(), 1);

           for (Node* n = list.first(); n != 0; n = list.next(n))
             std::cout << n->value << std::endl;
      }



  5. What happens if one copies the list? And what happens if both lists are deleted?




136
                                                               A.1. Fundamentals and Standard Library


          int main ()
          {
            List list;
            list.append(2);
            ...
            List list2 = list;
          }



C++ Quiz 2

Here are some additional questions from https://cppquiz.org for you to answer:
Question 1: https://cppquiz.org/quiz/question/124 (template template parameters)
Question 2: https://cppquiz.org/quiz/question/32 (constructor calls)
Question 3: https://cppquiz.org/quiz/question/125 (static in templates)
Question 4: https://cppquiz.org/quiz/question/17 (con-/destructors and inheritance)
Question 5: https://cppquiz.org/quiz/question/208 (inserting elements into maps)
Questions 2 and 4 discuss important points about constructors and destructors, while questions
1 and 3 are about minor details that might be interesting. What would you say about question
5?


Linked List (Const Correctness)

On the previous sheet you programmed a linked list to practice the interaction of construc-
tors, destructors and pointers. We now want to extend this implementation, practicing const
correctness and clean encapsulation.

      1. The const keyword has already been added to some of the methods and their argu-
         ments. Check if these qualifiers make sense and whether some are missing elsewhere,
         discuss:
            • Which methods should or should not be const , and why?
            • What is a good choice for the parameters and return values?
            • What about data members of List , and what about the Node class?

      2. Add a const method max() that returns the maximum of the stored values, as long as
         the list is non-empty35 . For efficiency, this value should be cached and only recomputed
         when it it requested and has become invalid in the meantime.
35
     Throwing an exception for empty lists would be good style, but here you might just return zero instead.




                                                                                                           137
A. Exercises


        • What effect does the keyword mutable have and why is it needed here?
        • What would be inefficient about updating the stored value every time the list is
          modified in some way?
  3. A function for printing the list could look as follows:

      void printList (const List& list)
      {
        for (const Node* n = list.first(); n != 0; n = list.next(n))
          std::cout << n->value << std::endl;
        std::cout << "max: " << list.max() << std::endl;
      }

        • What exactly do the two instances of const refer to, especially the second one?
        • Test your implementation with this function.


STL Container Types

In the lecture the different types of containers of the Standard Library have been discussed.
Each type offers certain guarantees and is ideally suited for specific use cases. Here is a list of
scenarios, what type of container would you use and why?

  1. You are writing a finite difference discretization of the Laplace equation, and you are
     using a structured grid. For each grid node, the solution at this point is stored. All nodes
     of the grid are numbered consecutively and their number is known a priori.
      Which type of container is suitable for storing the node values?
  2. You are writing a Quicksort sorting algorithm, and you want to store the pivot elements
     in a stack structure to avoid recursive function calls.
      Which type of container is well suited to save the pivot elements and which to store the
      data itself ?
  3. When implementing direct solvers for sparse matrices, it is common to first minimize
     the bandwidth of the matrix in order to reduce the storage requirements of the resulting
     LU decomposition. One method for this is the Cuthill-McKee algorithm. In the course of
     this algorithm elements have to be buffered in a FIFO (first-in, first-out) data structure.
     The order of the elements in this FIFO does not change.
      What type of container would you use as a FIFO?
  4. In your finite difference program, the solution is defined a priori on a part of the bound-
     ary. Nodes in this part of the boundary (Dirichlet nodes) aren’t real degrees of freedom
     and must be treated differently when assembling the finite difference matrix. The number
     of these nodes is small compared to the total number of nodes.



138
                                                   A.1. Fundamentals and Standard Library


     You want to dynamically read the list of Dirichlet nodes and their solution values from
     a configuration file and make them accessable node by node. When solving linear PDEs
     the limiting factor is typically the memory, as you want to solve very big problems.

     In what container you would store the indices of Dirichlet nodes and their values?


Pointer Puzzle

Remark: you may actually try this with your compiler.

Look at the following program:

void foo ( const int** );

int main()
{
  int** v = new int* [10];
  foo(v);
}



The compiler will exit with an error message, because you make a const int** out of the
int** , something like this:

g++ test.cc -o test
test.cc: In function 'int main()':
test.cc:6: error: invalid conversion from 'int**' to 'const int**'
test.cc:6: error:   initializing argument 1 of 'void foo(const int**)'

Actually, shouldn’t it always be possible to convert from non- const to const . . . ? Why
doesn’t this apply here?

Tip: It’s clear why the following program doesn’t compile:

const int* bar ();

int main()
{
  int** v = new int* [10];
  v[0] = bar();
}



What is the relation between this program and the one above?



                                                                                          139
A. Exercises


A.2. Advanced Topics

Matrices and Templates

In the lecture templates were presented as a technique of generic programming. They allow
the development of algorithms independent of underlying data structures.
In this exercise we will extend an existing implementation of a matrix class to a template class.
Such matrices are needed again and again in numerical software. On a previous sheet you had
to implement matrices for double values. Depending on the application you want, however,
it may be useful to have matrices based on other types of numbers, for example complex ,
 float or int . Templates may reduce code redundancy, so that one has an implementation
of Matrix that can be used for all the different types.
As a reminder, here is the functionality you had to implement:
      • Matrix entries are stored as a vector of vectors, i.e., std::vector<std::vector<double>> .
      • The matrix dimension is handled by two private int values, numRows and numCols .
      • The parenthesis operator allows access to individual entries, while the bracket operator
        gives access to a whole row:

        double& operator()(int i, int j);
        std::vector<double>& operator[](int i);


      • Free functions provide functionality for the multiplication of matrices with vectors and
        scalars, and for the addition of two matrices.
For your convenience, an implementation of such a matrix class can be found on the lecture
website, based on the following files:
      • matrix.hh
      • matrix.cc
      • testmatrix.cc
The first two contain the definition and implementation of Matrix for double , while the
third file contains a test program. You may compile it like this:
g++ -std=c++11 -Og -g -o testmatrix matrix.cc testmatrix.cc
In addition to the methods specified in the previous exercise, this version also provides const
versions of the parenthesis and bracket operators for read-only access, implements the free
functions based on assignment operators += and *= , provides resize methods, and in turn
drops the second matrix-vector multiplication method.
You are free to use either your own version or the provided one as a basis for the following
tasks. Do the following:



140
                                                                        A.2. Advanced Topics


  1. Change the implementation of Matrix to that of a template class, so that it can be
     used for different number types.
  2. Turn numRows and numCols into template parameters. Remove any function that
     doesn’t make sense under this change (e.g., resize if you use the provided version),
     adapt the constructors accordingly, and remove any runtime bounds checks that become
     unneccesary.
  3. Change the free functions into template functions that match the aforementioned tem-
     plate parameters. You may assume that scalars, vectors and matrices all use the same
     element type. Replace std::vector by a template template parameter, so that the
     function would also work for, e.g., std::array .
  4. Change the main function of the test program, so that your template variants are
     tested. The program should use all data types mentioned above, with complex referring
     to std::complex<double> and the required header being <complex> . Also test the
     matrix with your Rational class. For the print functionality you have to define a
     free function

      std::ostream& operator<< (std::ostream& str, const Rational& r)


     that prints the rational number by first printing its numerator, then a “ / ” and then its
     denominator.


Matrices and Templates (II)

This exercise is an extension of the templatization exercise above.

  1. Split the templated class Matrix into two classes:
        • A class SimpleMatrix , without numerical operators. This represents a matrix
          containing an arbitrary data type and has the following methods, all taken from
           Matrix :
            – some constructors
            – operator()(int i, int j) for access to individual entries

            – operator[](int i) for access to whole rows

            – a print() method

        • A numerical matrix class NumMatrix derived from SimpleMatrix . The numerical
          matrix class additionally provides the arithmetic operations functionality mentioned
          above.
     Also modify the free functions accordingly.



                                                                                           141
A. Exercises


  2. Write a test program that covers the two classes SimpleMatrix and NumMatrix . Your
     test program does not need to test each number type again, an arithmetic test with
      double is enough. Instead, test your SimpleMatrix with string s as data, and check
     whether you can create and use objects of type SimpleMatrix<NumMatrix<...>,...>
     and NumMatrix<NumMatrix<...>,...> . Which methods are you able to use? What
     happens when you try to call the print method?

      The latter matrix type is known as a block matrix, a type of matrix that arises in
      numerical discretizations and can be used to design efficient algorithms if it has additional
      structure.

  3. Write template specializations of SimpleMatrix so that print functionality is restored
      for block matrices. Block matrices should be printed as one big matrix, with “ | ” as
      horizontal separators between matrix blocks and rows of “ --- ” to separate the blocks
      vertically (it is okay if it isn’t the most beautiful output). Note: it might be helpful
      to assign the dimensional template parameters to public enums, so that you have the
      dimensions available during printing. What do you observe when implementing this?

      This would be a rather good opportunity to use SFINAE to provide two different print
      methods and choose the right one based on the template parameters, instead of having
      to provide several different base classes.

  4. The provided matrix class (and most likely also your own) simply aborts the program
     when matrix-vector products with mismatching dimensions are attempted. Replace the
     relevant checks with adequate exception handling. Derive your own custom exception
     from std::exception , or pick a fitting one from the predefined ones if applicable.
     Add a try/catch block to your test that triggers an exception and prints an explanatory
     message.


C++ Quiz 3

Here are some additional questions from https://cppquiz.org for you to answer:

Question 1: https://cppquiz.org/quiz/question/162 (unqualified name lookup)

Question 2: https://cppquiz.org/quiz/question/44 (dynamic polymorphism)

Question 3: https://cppquiz.org/quiz/question/29 (virtual during con-/destruction)

Question 4: https://cppquiz.org/quiz/question/224 (abstract base classes)

Question 5: https://cppquiz.org/quiz/question/133 (diamond pattern)

Question 1 is about name lookup rules, and questions 2 to 5 have a look at dynamic polymor-
phism, virtual methods, and virtual inheritance. Do the results match with your expectations?
Discuss.



142
                                                                                           A.2. Advanced Topics


Interfaces: Integration

In the lecture the concept of interfaces was introduced, using virtual methods and abstract
base classes. This exercise considers numerical integration as an example application.
The goal is to write a code library that allows, via numerical integration, to determine the
integral of an arbitrary function f (t) on a given interval t ∈ [A, B] (A, B ∈ R). The interval
[A, B] is divided into N subintervals of uniform length ∆t.
                                  B−A
                          ∆t =        ,         ti = A + ∆t · i,        i = 0, . . . , N
                                   N
On each of the subintervals the integral can be estimated using an appropriate quadrature
rule. Summing up these estimates produces a composite quadrature rule:
                               ZB               N −1                               
                                                           t            t
                                                X
                                    f (t)dt =            Qti+1
                                                           i
                                                               (f ) + Etii+1 (f )
                              A                 i=0


Here Qba refers to the result of the quadrature rule on the interval [a, b] and Eab to the error.
Quadrature rules are normally based on polynomial interpolation. This means it is possible to
specify the maximum degree of polynomials that are integrated exactly and the order of the
error in the case of functions that can’t be integrated exactly.
In this exercise, we want to limit ourselves to two quadrature rules:
   • The trapezoidal rule:
                                                  b−a
                                       QTrapez ba (f ) =(f (a) + f (b))
                                                     2
      It is exact for polynomials up to first order, and the error of the resulting composite rule
      is O(∆t2 ).
   • The Simpson rule:
                                                                            
                                               b−a                a+b
                          QSimpson ba (f )   =     · f (a) + 4f         + f (b)
                                                6                  2
      It is exact for polynomials up to degree two, and the error of the resulting composite rule
      is O(∆t4 ).
A good test for such a numerical integration is to study the order of convergence. For a problem
with a known solution, calculate the error EN for N subintervals and for 2N subintervals.
Doubling the number of subintervals cuts the interval width ∆t in half, reducing the error.
The quotient
                                              log(EN /E2N )
                                     EOC =
                                                   log(2)
is an indication of the order of convergence of the error (e.g., for the trapezoidal rule you should
observe order 2).

  1. Design interfaces for the different concepts. To this end, proceed as follows:



                                                                                                           143
A. Exercises


        • First, identify the abstract concepts in our problem.

            – We want to evaluate integrals of various functions; so we have the function as
              an abstract concept.

            – An integral is represented as a sum of integrals of subintervals.

            – On each subinterval a suitable quadrature formula is applied.

            – The quadrature formula evaluates the function to be integrated at certain points
              in order to determine an approximate solution.

            – A composite quadrature rule sums the results.

            – A suitable test problem would contain

                ∗ a function to integrate

                ∗ a choice of quadrature rule and composite rule

                ∗ the interval bounds

                ∗ the true resulting integral value

        • How are these abstract concepts mapped to interfaces? Usually, one introduces a
          base class as an interface description for each concept.

            – What is a suitable interface for a function?

            – A quadrature rule evaluates a function on a given interval, how would a suitable
              virtual method look like?

            – Each quadrature rule can also report the degree of exactly integrable polyno-
              mials, and report the order of convergence of the integration error that results.

            – The actual integration (i.e., composite quadrature rule) is parameterized with
              a quadrature rule and a function. It evaluates the integral of the function on
              an interval that has to be specified, dividing the interval into N subintervals.
              In addition, the quadrature formula that is to be used must be specified.

            – Apart from the equidistant composite quadrature used here, there may be other
              approaches (see second exercise). In anticipation of extensions, also introduce
              the abstract concept of a composite quadrature rule, although right now there
              is only one actual implementation.

            – What would be a suitable interface for test problems?

  2. Describe your interfaces and explain the design decisions. What are the reasons for your
     specific choice of interfaces?



144
                                                                            A.2. Advanced Topics


   3. Implement the interfaces using abstract base classes and dynamic polymorphism. Write
      implementations for all of the above variants of abstract concepts (trapezoidal rule, Simp-
      son’s rule, equidistant composite rule). Your functions and classes should be as general
      as possible, i.e., use interface classes for arguments and return values. Make use of the
      best practices introduced in the lecture.
   4. Test your implementation with the integration of 2t2 + 5 on the interval [−3, 13] and
       t
      π sin(t) on the interval [0, 2π]. To do this, implement appropriate test problems for each
      combination of test function and quadrature rule. Write a free function that expects a
      test problem and a number of steps to perform, and then
         • reports the true integral value and expected convergence order
         • starts with one big interval without subdivisions
         • doubles the number of subintervals after each step
         • applies the composite rule in each step
         • reports in each step:
             – the current estimated value
             – the resulting error
             – the estimated order of convergence (EOC)
             – the deviation from the expected order
      Discuss the results.


Adaptive Integration

An improvement over the integration using equidistant intervals is adaptive integration. Here
one tries to only use small intervals ∆t where the error is actually large.
If you are in need of additional points or not challenged enough ;-) , you may write a class for
adaptive integration that complements the equidistant one from the previous exercise. This
new version should use the same interface for functions and quadrature rules and fit into the
test infrastructure you have written.
One can estimate the error in one of the subintervals by comparing the result of quadrature
with the result of an interval that has been refined once, i.e., the local error is estimated through
hierarchical refinement. The goal is to have the “error density”, i.e., error per unit length of
interval, to be about the same in all subintervals. Subintervals with large errors are divided
again, iteratively or recursively. One starts with a small number of subintervals and repeats
the error estimation and refinement until the desired number of subintervals is reached.
To avoid having to constantly reevaluate the integrals of the individual subintervals, it is helpful
to store the subintervals, integrals and estimated errors in appropriate data structures: note
that the refinement used to estimate the local error will immediately become part of the set of



                                                                                                 145
A. Exercises


subintervals if this error proves to be large. The containers of the C++ Standard Library, e.g.,
 std::map and std::deque , might prove useful. The same holds for function evaluations:
note that one half resp. one third of the function evaluations in the previous exercise were
actually unnecessary and quite wasteful, because interval bounds were generally evaluated
twice.

You are free to design more or less efficient algorithms for the local error estimation and for
deciding which intervals should be bisected and in which order. Your primary concern is a
program that produces good approximations w.r.t. the number of intervals used, thoughts
about runtime efficiency are a good idea but not mandatory.

The points of this exercise don’t count towards the total number of points that can be achieved
during the semester. Consequently, you can do this exercise instead of another one later on or
in addition to the other exercises to achieve a higher point percentage.


Template Metaprogramming: Number Sequences

Template metaprogramming might be an unusual programming technique, but it is Turing
complete: this can be shown by simply implementing a Turing machine in C++ templates
(you can find some online if you are interested). This shows that anything you can write as a
normal program could, at least in principle, be achieved using template metaprogramming (see
Church-Turing conjecture36 ). The main obstacles are the finite amount of recursive template
levels the compiler allows, the fact that dynamic interaction with the user is impossible, the fact
that floating-point numbers can’t be template parameters, and of course the slightly awkward
way things have to be written. The only way to provide some form of user input is through
preprocessor macros, since the usual command line arguments and queries using std::cin
are not known at compile time.

Solve the following two tasks with template metaprogramming:

     1. Fibonacci Numbers. Write both a template metaprogram and a normal function that
        compute the k-th element of the Fibonacci sequence for given k ∈ N:
                                         
                                         1
                                                       for k = 0
                                   ak := 1              for k = 1
                                         
                                           ak−2 + ak−1 else
                                         

        Base your two implementations on the mathematical definition above, i.e., directly im-
        plement the given recursion formula. Then write a main function that calls both versions
        for the argument “ INDEX ”, and prints the results. Here, INDEX is a preprocessor macro,
        which we use instead of command line arguments or user input to control which index k
        is used. You may define this index during the compilation process, e.g., for GCC:

        g++ -D INDEX=<your value> -o main main.cc
36
     https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis




146
                                                                            A.2. Advanced Topics


        Remember that you have to recompile your program if you want to change this index.
        Measure both the compilation time and runtime of your program from k = 5 to k = 50
        in steps of 5 (under Linux you can use the “time” command for this task). What do
        you observe? Plot both times over the index k and discuss your observations. What
        is the explanation for the asymptotics? There are significantly faster algorithms for
        the computation of Fibonacci numbers37 . Would you suggest using one of these other
        algorithms for the template metaprogram, the function, or both?
     2. Prime Numbers. Write a second template metaprogram that prints the N -th prime
        number. Remember that a prime number is a natural number larger than one, that
        is only divisible by one and itself. Here are some steps that you may follow in your
        implementation:
           • Start with a struct is_prime<int P> that exports a const static bool that is
             true for prime numbers and false for all other numbers. A simple implementation
             uses the signature

              template<int P, int K = P - 1> struct is_prime;


             that checks whether P is not divisible by any number smaller or equal to K (except
             for one, of course). A recursive definition of this function is straight-forward, using
             the modulo operator and simple Boolean algebra. The default case then provides
             the desired struct.
           • Then write a second struct next_prime<int N> that exports the smallest prime
             P with P ≥ N . If N is prime, then this is simply N itself, else we can use recursion
             again by exporting the smallest prime P with P ≥ (N + 1). Why is this recursion
             guaranteed to terminate? The recursion needs to branch on whether N itself is prime
             or not, but both conditionals ( if/else ) and the ternary operator ( X ? Y : Z )
             would always instantiate the next recursion level, whether X is true or false, leading
             to infinite recursion. From C++17 onwards one could use the constexpr if
             construct to only instantiate one of the two branches, thereby solving the problem,
             but your code should be valid C++98/03. Once again, this can be solved with a
             slightly extended signature:

              template<int P, bool B = is_prime<P>::value> struct next_prime;


             How can you use this second (auto-evaluated) parameter to get the desired branching
             behavior?
           • Finally, write a struct prime<int N> that exports the N -th prime number, making
             use of the other structs you have defined. This definition is also recursive. What
             is the base case? How can you compute the correct prime number for all the other
             cases recursively?
37
     https://www.nayuki.io/page/fast-fibonacci-algorithms




                                                                                                147
A. Exercises


      Use the aforementioned preprocessor macro trick to evaluate your template metaprogram
      for different values of N , and check that the right sequence is created.


Matrices: An Application of SFINAE

One of the major drawbacks of class template specializations is the need to reimplement the
whole class template, even if only a single method has to be modified. This leads to significant
code duplication, especially for classes that provide a large number of methods, and inconsis-
tencies between the different specializations may occur after some time if updates aren’t copied
to all the different specializations.
SFINAE can be used to solve this issue: there is no explicit specialization of the class template,
instead there are simply two (or more) different versions of the one method that needs to be
changed, with exactly one of them being instantiable for any parameterization of the class
template. This technique can be used for both normal methods and methods that are function
templates themselves: if the method is a template, then it simply gains an additional template
parameter for the std::enable_if , and if it is an ordinary function it becomes a template
with a single parameter instead.

  1. Apply SFINAE to the print() method of your matrix implementation. As discussed in
     a previous exercise, the default implementation is only correct if the elements of a given
     matrix are actual numbers, and doesn’t work anymore if the matrix is actually a block
     matrix, i.e., a matrix with entries that are themselves matrices. Instead of redefining the
     whole matrix class for such cases, introduce two versions of the print() method, one
     that is instantiated when the entries are scalars, and one that is instantiated when the
     entries are matrices.
      You can reuse the function bodies if you have completed the previous exercise, else you
      will have to implement one of them. To determine whether a given matrix is a block
      matrix or not, check the following properties of the entries:
        • Does the type of the entries export a number of rows? (You may have to introduce
          a public enum in your matrix class to make this available if you haven’t already
          done so.)
        • Does the type of the entries export a number of columns? (With an enum as above.)
      You may assume that anything exporting these two constants is actually a matrix (and
      not, say, a higher-dimensional tensor or similar). Make sure that exactly one of the
      conditions in the std::enable_if clauses is true for any potential entry data type.
  2. Assume for a moment that there are both matrix classes with public enums as above,
     and matrices that make the number of rows / columns available via methods (which is
     what a matrix implementation with dynamic size would do). How would your SFINAE
     have to change if it should work with
        • normal entries such as numbers ( double , int , . . . )



148
                                                                  A.3. Modern C++ Features


       • matrices that export enums as treated by your implementation
       • matrices that provide this information via methods instead
     You don’t have to actually implement the functionality needed for the checks: assume
     that all required structs are already provided, and write down what the signatures of the
     three implementations would look like.
  3. SFINAE can also be used to provide two or more competing (essentially non-specialized)
     templates, which can eliminate the need for several different function overloads. Consider
     the multiplication operators, there are usually three different types:
       • with a scalar
       • with a vector
       • with another matrix
     The first two of these have already been implemented: for concrete types, or as templates
     parameterized by container types. However, the code for these three types of multiplica-
     tion is usually quite generic, and would work the same way for several different types of
     scalars, vectors, and matrices. This makes the implementation via completely abstract
     function templates attractive, i.e., with the type of the second argument as template
     parameter. The problem: all three types of multiplication would have the same function
     signature.
     Use SFINAE to work around this problem:
       • The existence resp. nonexistence of which methods can be used to characterize
         scalars, vectors, and matrices?
       • Provide traits classes is_vector<T> and is_matrix<T> that can detect whether
         a given type matches the vector resp. matrix interface.
       • Implement all three types of multiplication as free function templates, parameterized
         by the type of the second argument, and use SFINAE to select the correct template
         version for a given type of operand.
     Modify one of your existing test cases to make sure that your SFINAE construct works
     and picks the right template in each case.


A.3. Modern C++ Features

C++ Quiz 4

Here are some additional questions from https://cppquiz.org for you to answer:
Question 1: https://cppquiz.org/quiz/question/163 (automatic type deduction)
Question 2: https://cppquiz.org/quiz/question/236 (operator auto)



                                                                                           149
A. Exercises


Question 5: https://cppquiz.org/quiz/question/112 (move semantics)
Question 3: https://cppquiz.org/quiz/question/226 (move semantics II)
Question 4: https://cppquiz.org/quiz/question/116 (perfect forwarding)
Question 1 and 2 are about automatic type deduction, with the latter being more of a pecu-
liarity. The other three questions are about rvalue references, forwarding references, and move
semantics: what one might expect to happen, and what actually happens according to the
standard.


Automatic Type Deduction

Take your most recent matrix class implementation (it doesn’t actually matter which one as
long as it is templatized). We would like to make use of automatic type deduction at several
points within the matrix class, resp. classes.

  1. Replace the return type of all the methods with auto . At least one method each should
     be of the following form:
        • trailing return type with explicit specification of the resulting type
        • trailing return type, declaring the return type with decltype and the function
          arguments
        • no trailing return type, simply deduct the resulting type from the return statements
          (requires compiling with C++14 or above)
  2. Check the attributes of the class: is it possible to replace some of their types with auto ?
     If yes, which ones? If not, why not?
  3. Check the local variables of the methods, and replace the explicit type with auto
     wherever possible.
  4. After having converted all possible locations to auto : which instances of using auto
     are actually a good idea, if any? Are there locations where this modification doesn’t add
     any value, in your opinion? Discuss.


Constant Expressions

In a previous exercise you were tasked with writing template metaprograms to compute two
number sequences, namely the Fibonacci sequence and the prime number sequence. In this
exercise we will reimplement these two sequences, but with constant expressions instead of
templates.
Solve the following two tasks using constexpr functions. Note that these functions should be
valid in C++11, i.e., each function must consist of a single return statement. This means that
 if / else branches can’t be used, you will have to rely on the ternary operator ( x ? y : z )
instead. The value of this expression is y if the condition x is true, else z.



150
                                                                  A.3. Modern C++ Features


  1. Fibonacci Numbers. If you solved the previous exercise, you should already have a
     normal function that computes the Fibonacci sequence. Modify this function so that is
     becomes a valid C++11 constexpr function (the extent of modifications depends on
     your chosen implementation, of course). Simply implement the function directly if you
     didn’t do the previous exercise. Here is a quick reminder of the mathematical definition
     of the k-th element, k ∈ N:
                                       
                                       1
                                                       for k = 0
                                 ak := 1                for k = 1
                                       
                                         ak−2 + ak−1 else
                                       


  2. Prime Numbers. Reimplement the prime number sequence using constexpr func-
     tions. You can follow the outline from the previous exercise, if you like:
        • Create a constexpr function is_prime (int p, int k = p-1) that checks whether
          p is not divisible by any number larger than one and smaller than or equal to k (re-
          cursive function).
        • Write constexpr functions next_prime(int n) and prime(int n) that return
          the next prime after n and the n-th prime, respectively.
        • Note that there is a fundamental difference to the previous exercise: the ternary op-
          erator wasn’t suitable for the template metaprogram, since both “branches” would
          be instantiated in any case (infinite recursion), but it is actually guaranteed that
          only one of them will be executed in the end. This means you can use the ternary
          operator for this version, as mentioned above.
  3. Comparisons. Compare the implementation effort and code complexity of the tem-
     plate metaprogramming and constexpr versions for both sequences. Which version
     would you prefer? Measure the compilation times of all four codes (Fibonacci and prime
     numbers, template metaprogramming and constexpr ) for different inputs. What do
     you observe? Discuss.


Linked List with Smart Pointers

Revisit the linked list exercise: we used raw pointers back then, but we might just as well
use smart pointers instead. The behavior of the linked list class will change depending on the
actual choice of smart pointer. Note that one doesn’t assign nullptr to an invalid smart
pointer, or compare against this value to check validity. Instead, invalid smart pointers are
simply those that don’t currently store a raw pointer and evaluate to false when used as a
Boolean expression.

  1. Try replacing the raw pointers with std::unique_ptr<Node> . Is this straight-forward?
     If not, what exactly is problematic about the chosen interface of the linked list? Remem-
     ber what happened when the list was copied. What happens / would happen, if unique
     pointers are used instead?



                                                                                           151
A. Exercises


     2. Replace the raw pointers with std::shared_ptr<Node> . What happens in this case if
        we copy the list, and then modify one of the copies? And what happens if one of them
        or both go out of scope?

     3. Turn the shared pointer version of the linked list into a doubly linked list: introduce an
        additional method previous that returns the previous node instead of the next one,
        and modify the Node class accordingly. Note that simply using shared pointers for this
        would produce a silent memory leak, since the resulting loops of shared pointers would
        artificially keep the list alive after it goes out of scope. This means you have to use a
         std::weak_ptr<Node> instead.

     4. Discuss the benefits and drawbacks of smart pointers for this application. How would
        you personally implement a doubly linked list, and what would be your design decisions?


Lambda Expressions for Integration

Introduced in C++11, lambda expressions are a very convenient way to define function objects
(functors). We will have a second look at numerical integration, so that you can decide for
yourself whether using lambdas makes the resulting code more readable or not.

In a previous exercise you had to implement several functors to write a program for numerical
integration (quadrature):

      • scalar real functions (polynomials and trigonometric functions)

      • quadrature rules (trapezoidal rule and Simpson’s rule)

      • composite rules (only the equidistant one in our case)

      • ...

Each of these functors was derived from some abstract base class that defined its interface.

Your task is the modification of the program you already have, so that it uses lambda expres-
sions instead of the classes you created (or to write the program from scratch if you skipped
that exercise). Proceed as follows:

     1.     Remove all abstract base classes. Lambda expressions introduce anonymous auto-
          generated classes that are not derived from anything38 , so these base classes become
          irrelevant. The usual replacement for these abstract base classes would be templates, to
          model the polymorphism at compile-time. However, you need to pass lambdas as argu-
          ments to lambdas, i.e., these lambda expressions themselves would need to be templates.
          This isn’t possible in C++11, since a lambda expression defines a functor class, not a
          functor class template, and the type of function arguments is therefore fixed. C++14
          introduces generic lambdas, as we will see, but for now another solution has to be used.
38
     https://en.cppreference.com/w/cpp/language/lambda




152
                                                                         A.3. Modern C++ Features


     2. C++11 provides a class template called std::function (in header <functional> ).
        This template is an instance of type erasure: you can assign any type of function or func-
        tion object to it, and it will forward the function calls to it, while its type is independent of
        the exact type of its contents. This means you can capture lambdas in std::function
        objects and ignore the fact that each lambda defines its own type. Inform yourself how
        this class is used and what its drawbacks are39 .

     3. Replace each functor object with an equivalent lambda expression, using std::function
        wherever you have to pass a lambda to another lambda. Note how the std::function
        argument serves as documentation of the expected function interface. Wherever appro-
        priate, capture local variables instead of handing them over as arguments, and make sure
        that the right capture type is used (by-value vs. by-reference).

     4. Run the test problems as described in the previous exercise, and make sure that your
        new version produces the same results. Use the time command to compare the compile
        time for the two versions, as well as the runtime for a large number of subintervals.
        Which version is faster, if any, and which is easier to read resp. implement? Discuss.


Variadic Templates: Type-Safe Printing

The concept of variadic functions is nothing new: C has them as well, e.g., the printf
function that creates formatted output. However, these C-style variadic functions40 have two
drawbacks:

      • They need special macros, like va_start , va_arg , va_end , etc., which are not very
        intuitive to use and incur a certain runtime cost.

      • They are not type-safe, since there is no way to know what data types are going to be
        passed as arguments.

Variadic functions based on the variadic templates introduced in C++11, however, have full
access to the type information of their arguments. Use this to write a flexible print()
function:

     1. Start by writing several variants of this function for a single argument, using a combi-
        nation of, e.g., SFINAE and simple function overloads:

           • Any class that has a print() method should be printed by simply calling this
             method. Create a simple numerical vector class with such a method that prints its
             components as comma-separated values enclosed by brackets, e.g., [2.72,3.12,6.59] ,
             and test your function with it. There is no need to implement any of the usual meth-
             ods of vector classes, just an appropriate constructor and the print() method.
39
     https://en.cppreference.com/w/cpp/utility/functional/function
40
     https://en.cppreference.com/w/cpp/utility/variadic




                                                                                                     153
A. Exercises


           • Strings ( std::string ) and string literals ( const char* ) should be placed be-
             tween quotation marks (you can ignore the fact that internal quotation marks in
             the strings would have to be escaped in practice).
           • Floating-point numbers should be printed with six-digit precision in scientific nota-
             tion. Use the appropriate I/O manipulator41 and the precision() method of the
             stream object, and don’t forget to restore the original precision after printing.
           • Any other single argument should simply be printed by passing it to std::cout .

     2. Write a variadic print() function that is able to print an arbitrary number of argu-
        ments using the rules above. Base your implementation on the single-argument versions
        you have already produced, and note that the order of function definitions is important.
        Test your function with some sequences of mixed arguments. There is no need to specify
        any types, like you would for printf : the compiler can deduce those types from the
        function argument. Note that this also means that there is no way to accidentally specify
        the wrong type, of course.


Variadic Templates: Tuples

The lecture discussed a custom implementation of arrays using recursive data types:
      • a zero-element array is simply an empty struct
      • an (N +1)-element array is an N -element array (public inheritance) storing an additional
        element
      • the number of elements N and the type of elements T are template parameters
      • a templatized entry method returns the M -th entry, 0 ≤ M < N , of the array
Back then we noted that tuples could also be implemented this way, but only if variadic
templates are available for the arbitrarily long list of stored types. With variadic templates
having been introduced in the lecture, you are tasked with implementing such a recursive
variadic class template that defines tuples.
Proceed as follows:
     1. Define a class template Tuple that accepts an arbitrary number of types, but don’t
        provide an implementation. Every possible case is handled through a specialization, but
        you will see that you need this general declaration, because otherwise you will not be
        able to match the template parameter signatures.
     2. A tuple containing N + 1 variables of up to N + 1 different types consists of a variable
        of the first type and a tuple of the remaining N variables with associated types (again
        via public inheritance). Create a template specialization for the case N > 0, with an
        appropriate data member, base class specification, and constructor.
41
     https://en.cppreference.com/w/cpp/io/manip




154
                                                                   A.3. Modern C++ Features


  3. Write a method function template entry<int M>() that returns the M -th entry (0 ≤
     M < N ). Two cases are possible:
        • M = 0: The zeroth entry is the data member, so we can simply return that.
        • M 6= 0: The requested entry is part of the base class, so we forward the call and
          return what the base class gives us.
     Use SFINAE to differentiate between these two cases, and handle them separately. Note
     that we don’t actually know the returned type in the second case. Even knowing the
     return type of the direct base class is not enough, because the actual data member might
     be in some base class of the base class. One possible solution would be an internal
     template metaprogram that extracts the correct type. Is there a simpler solution?
  4. The class std::tuple has two different access methods: via index as above, or via
     requested type, if the latter is unique across the tuple. Provide this functionality for the
     custom tuple class, i.e., write a method function template entry<U>() that returns the
     entry of type U . There are two possibilities:
        • The data member has type U and that type doesn’t appear in the base class: simply
          return the data member as above.
        • The data member doesn’t have type U : hand the request over to the base class.
     Use SFINAE to differentiate between these two cases, and handle them separately. Note
     that we don’t cover the case where U appears more than once explicitly — it’s okay
     if this just results in a compilation error. You will need information about contained
     types: write an internal class template struct contains<U> that exports true if the
     tuple or its base classes contain a data member of type U , else false . Use normal
     logic operators ( && and || ) in the SFINAE and internal struct, or the C++17 class
     templates std::conjunction and std::disjunction if you want.
  5. Provide a base case, which is an empty tuple. Just as with the custom arrays, this is
     essentially an empty struct, but you will have to provide a base case for the internal
      contains struct.
The complete implementation contains four different entry methods, and the SFINAEs make
sure that exactly one of them matches in any situation, whether an index is passed as parameter
or a type. Note that in contrast to our custom implementation, the class std::tuple uses
a free function template named std::get for access. The main reason is that our version
becomes slightly awkward to use within templates, where one has to specify that the method
is, indeed, also a template: t.template entry<2>() or similar.


Concurrency with Threads

Use threads to implement a parallelized scalar product of two vectors. You may use any
vector class: the numerical ones of the lecture, a std::vector , or even a plain old C-style



                                                                                             155
A. Exercises


array. Alternatively, you may provide a function template to handle all these separate cases
simultaneously.
Create the following four versions, each operating on a subset of the vector components:
  1. A version using mutexes and locks, directly adding the occuring products to the result.
  2. A second version using mutexes and locks, computing the local scalar product of the
     indices belonging to the thread, and then adding those local products to the result.
  3. A variant of the first version, using atomics instead of mutexes.
  4. A variant of the second version, using atomics instead of mutexes.
      The main program should divide any given pair of two vectors into segments of more
      or less equal size and hand them two a matching number of worker threads. Test your
      four versions on large vectors, and measure the required time. What happens for larger
      numbers of threads, or what do you expect would happen, in case the number of parallel
      threads you can start is very limited?
      Assume for a moment that the number of threads is so large that even the second and
      the fourth version suffer from congestion (this is a real problem, albeit in the context
      of message passing on very large super clusters). What could be done to alleviate the
      problem? You don’t need to implement the solution.
      In a real numerical program, the scalar product would be used for subsequent compu-
      tations. An example is the Conjugate Gradients method, where the expression for the
      step direction of the scheme contains a scalar product. In a parallelized version of such
      a program, the threads would not just compute the scalar product, but perform other
      operations before and after. It is obviously very important to make sure that the scalar
      product has been fully computed before using the result in other computations.
  5. Create a synchronization point (barrier) for the first two versions, e.g., using counters
     and condition variables or something similar. After this point, have each thread print the
     scalar product as a stand-in for further computations, and check that all threads report
     the same value.
  6. Inform yourself about memory order models and their consequences, and try to create
     a similar barrier for the atomics versions, e.g., using atomic counter variables and a
     Boolean flag that uses load and store . Print and check the results as above.


C++ Quiz 5

Here are some additional questions from https://cppquiz.org for you to answer:
Question 1: https://cppquiz.org/quiz/question/219 (variadic function template)
Question 2: https://cppquiz.org/quiz/question/131 (types of initialization)
Question 3: https://cppquiz.org/quiz/question/109 (most vexing parse)



156
                                                                   A.3. Modern C++ Features


Question 4: https://cppquiz.org/quiz/question/42 (initializer list constructor)
Question 5: https://cppquiz.org/quiz/question/48 (async and futures)
Question 1 revisits variadic function templates. Questions 2 through 4 are about the different
kinds of initialization and construction, and certain details that might be surprising. Question
5 is about the futures returned by std::async , and how their behavior differs from that of
other futures.


Message Passing Interface (MPI)

In this exercise we will train using external libraries and performing parallel computations.
Use the Message Passing Interface (MPI)42 to calculate matrix-vector products in parallel.
We will use OpenMPI, but you are free to use any other available MPI software package you
like. If you do not have MPI already installed on your machine, you can:
      • install it using your package manager (it should be called openmpi or similar)
      • manually download OpenMPI from https://www.open-mpi.org/software/ompi/v4.
        1/
In the latter case, extracting and installing is done by the commands:
shell$     tar -xvzf openmpi-X.Y.Z.tar.gz
shell$     cd openmpi-X.Y.Z
shell$     mkdir build
shell$     cd build
shell$     ../configure --prefix=/where/to/install
shell$     make all install
For more details about the installation process, you can have a look at the provided INSTALL
file or online at https://www.open-mpi.org/faq/.
If you have never used MPI before, it might be helpful to compile and run a hello world program
first, like this C-style example on https://mpitutorial.com/tutorials/mpi-hello-world/.
MPI-based programs have to include a header file called mpi.h . They also have to initialize
the MPI framework before it can be used, and shut it down before the program finishes. The
MPI interface is C code, and may therefore look unfamiliar to you. There are named constants,
all uppercase, and functions, with uppercase initial letter:
      • Startup/shutdown functions: MPI_Init , MPI_Finalize

      • Communication functions: MPI_Send , MPI_Receive , MPI_Alltoall , MPI_Barrier ,
        ...
      • Communicators: MPI_COMM_WORLD , MPI_COMM_SELF
42
     https://en.wikipedia.org/wiki/Message_Passing_Interface




                                                                                            157
A. Exercises


      • Data types: MPI_INT , MPI_FLOAT , MPI_DOUBLE , . . .

      • Reduction operations: MPI_SUM , MPI_PROD , MPI_MAX , . . .
Here are some resources you might find helpful, an example program, a condensed overview of
commands, and the documentation page:
      • https://en.wikipedia.org/wiki/Message_Passing_Interface#Example_program
      • https://www.boost.org/doc/libs/1_75_0/doc/html/mpi/c_mapping.html
      • https://www.mpi-forum.org/
To compile your code use
mpic++ -o application application.cc
This is a wrapper around the C++ compiler that takes care of MPI-related stuff. All flags you
otherwise would pass to your C++ compiler (like warnings or optimization), can be be passed
to mpic++ as well.
To run your code use
mpirun -n 4 ./application
where the number 4 stands for the number of processes you would like to start. Take care that
you don’t accidentally crash your system by invoking too many processes.
In thread-based applications, all the threads share the same memory space, and the main issue
is mutual exclusion. In applications based on message passing, however, each process has its
own memory space. This means values that are needed for some local computation may need
to be sent by one process and received by another.
Write one of the following parallel matrix-vector products, assuming an n × n matrix with
n divisible by the number of processos, and the input and result vectors each distributed in
chunks of equal size:

  1. Horizontally distributed matrix (whole rows). The i-th entry of the result vector is
     on the same process as the i-th row of the matrix, so assembling the output is easy,
     but the required components of the input vector are mostly non-local and have to be
     communicated.
  2. Vertically distributed matrix (whole columns). The j-th entry of the input vector is on
     the same process as the j-th column of the matrix, so local multiplication is easy, but
     the components of the output vector have to be accumulated from across all processes.

Which of these two versions is easier to implement? Which, do you think, is more efficient?
Choose carefully. Note that there is no need to communicate individual entries when you
want to send a whole subvector: the memory layout of std::vector is contiguous, and there
is even a data method that provides access to the underlying raw memory. Make use of
that. Receive anything you communicate in a local scope and discard it after using it: in real



158
                                                                      A.3. Modern C++ Features


applications you often don’t have the space to keep all intermediate data around (that’s one
of the reasons to parallelize, after all).


Message Passing, Part II

Implement the version of parallel matrix-vector multiplication you did not choose in the previ-
ous exercise. Determine which of your implementations is faster (for large dimension n), and
how the required time scales with the dimension in each version.


Variadic Templates: Matrix Polynomials

In the lecture we discussed the evaluation of polynomials using variadic templates. The re-
cursive function definition allowed using a different type for each of the coefficients, but was
restricted to double as the type of the variable x and the resulting function value p(x). We
would like to make this function more general step by step, until we are able to evaluate matrix
polynomials and polynomial matrices.
A matrix polynomial 43 is a polynomial where the scalar variable x is replaced with a square
matrix A:
                           p(A) = a0 I + a1 A + a2 A2 + a3 A3 + . . .
Any square matrix A commutes with itself, of course, so that integer matrix powers are well-
defined. The neutral element of multiplication I (the identity matrix) has to be specified
explicitly in this case, while it is normally omitted in ordinary, scalar-valued polynomials.
A polynomial matrix 44 is a matrix with polynomials as entries, which can also be written as a
(scalar) polynomial with matrices A0 , A1 , . . . as coefficients:

                               p(x) = A0 + A1 x + A2 x2 + A3 x3 + . . .

These matrices don’t have to be square, but they need to be of compatible dimensions, i.e.,
each an n × m matrix, with the same n and m across all the matrices.
Proceed as follows:

     1. Take the function definition of the lecture and introduce a template parameter X for
        the type of the first argument x. The resulting return type is now a function of X , T ,
        and the remaining types Ts... , and is not known a priori: it is certainly not simply
         X , because a polynomial with integer argument and real coefficients has to produce real
        function values. How can you specify the correct return type?
     2. The function should now be able to evaluate polynomial matrices. Test your imple-
        mentation with n × n matrices of small size, say n = 3, and make sure that the results
        are what you expect. Each entry of the resulting matrix is the evaluation of a separate
        ordinary, scalar-valued polynomial, so this should be straight-forward.
43
     https://en.wikipedia.org/wiki/Matrix_polynomial
44
     https://en.wikipedia.org/wiki/Polynomial_matrix




                                                                                             159
A. Exercises


  3. Expand the implementation so that it can also handle matrix polynomials. In this
     case the function has to compute matrix powers, and it becomes especially important to
     use Horner’s method. Handling matrices as arguments requires the explicit creation of
     identity matrices, see the definition above. How can you make sure that your function
     remains usable for ordinary polynomials with real x? Use a matrix class where the
     dimensions are defined through template parameters, so that you don’t have to consider
     the correct size when creating identity matrices.

  4. Test your implementation on matrix polynomials to make sure that everything works
     correctly. How does the computation time depend on the size of the matrices, and how on
     the number of coefficients? Create a variant that uses the less efficient evaluation strategy
     from the lecture. How does that influence the time that is needed for the computations?



Custom Range-Based Loops


Create your own range-based for loops for numerical vectors. Take one of our vector classes,
and write code that provides the additional functionality. The vector class itself should not be
modified (treat it as if it was from some external library). Instead, use the public interface of
the class.

  1. Write an iterator class with the following functionality:

        • A constructor taking a vector object. The resulting iterator belongs to this object
          and iterates over it.

        • Pre-increment and pre-decrement operators ( operator++ and operator-- ). These
          shift the iterator by one position.

        • A dereference operator operator* , which returns the entry the iterator refers to.

        • A not-equal-to operator operator!= , for comparison against the end of the vector.

      The iterator has to store its position, the current index is a natural fit. You also need
      to encode iterators with invalid state, e.g., by using the length of the vector as stored
      index.

  2. Provide free begin and end functions that receive a vector object and return an
     iterator object, pointing to the first entry for the former, and having an invalid state for
     the latter.

  3. Test your implementation to make sure that it is working, e.g., by printing the vector
     components in order. Then, provide a const version of the class and functions, so that
      const vectors work as well. Use the appropriate standard version of the range-based
     for loop in each case.



160
                                                                 A.3. Modern C++ Features


C++ Quiz 6

Here are some additional questions from https://cppquiz.org for you to answer:
Question 1: https://cppquiz.org/quiz/question/126 (unqualified lookup)
Question 2: https://cppquiz.org/quiz/question/184 (method shadowing)
Question 3: https://cppquiz.org/quiz/question/18 (private virtual function)
Question 4: https://cppquiz.org/quiz/question/264 (consequences of defaulted constr.)
Question 5: https://cppquiz.org/quiz/question/112 (default arg of virtual function)
Question 6: https://cppquiz.org/quiz/question/250 (overloaded variadic functions)
Question 7: https://cppquiz.org/quiz/question/109 (deduced and nondeduced context)
These are questions that shine some light on the hidden depths of C++. Neither the questions
nor their answers are in any way relevant for the exam, and the correct answer may range from
surprising to confusing. This means you don’t have to look at them, unless you are interested,
of course. Here be dragons.




                                                                                          161