DOKK Library

Safer Linux Kernel Modules Using the D Programming Language

Authors Alexandru Militaru Constantin Eduard Staniloiu Razvan Deaconescu Razvan Nitu

License CC-BY-4.0

Plaintext
Received 1 November 2022, accepted 9 December 2022, date of publication 15 December 2022,
date of current version 29 December 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3229461




Safer Linux Kernel Modules Using
the D Programming Language
CONSTANTIN EDUARD STANILOIU , ALEXANDRU MILITARU,
RAZVAN NITU , AND RAZVAN DEACONESCU
Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania
Corresponding author: Razvan Nitu (razvan.nitu1305@upb.ro)



  ABSTRACT Since its creation, the Linux kernel has gained international recognition and has been employed
  on a large range of devices: servers, supercomputers, smart devices and embedded systems. Given its
  popularity, the security of the kernel has become a critical research topic. As a consequence, a wide
  range of third party tools were created to detect bugs in its implementation. However, new vulnerabilities
  are discovered and exploited every year. The explanation for this phenomenon lies in the fact that the
  programming language that is used for the kernel implementation, C, is designed to allow unsafe memory
  operations. In this paper, we show that it is possible to incrementally transition the kernel code from
  C to a memory safe programming language, D, by porting and integrating a device driver. In addition,
  we propose a series of code transformations that allow the D compiler to reason about the safety of certain
  memory operations. Our implementation increases the security guarantees of the kernel without incurring
  any performance penalties.


  INDEX TERMS Memory safety, Linux kernel, driver development, security, D programming language.


I. INTRODUCTION
One of the most popular operating system kernels, Linux,
is used on a wide range of hardware, from supercomputers to
IoT devices. While Microsoft Windows dominates the desk-
top market, Linux is the most popular operating system used
by supercomputers [29], in the server market [31], handheld
devices, as part of the Android operating system [27] and the
embedded world [1].
   Like all operating system kernels, Linux runs in a
privileged processor mode (called kernel mode or supervisor
mode) with complete access to system memory and devices.
A successful attack on Linux will provide the attacker
full control of the entire system, making it a sought after
target. Such attacks represent a common occurrence. Figure 1                                   FIGURE 1. Number of Common Vulnerability and Exposure (CVE) reports.
highlights the number of vulnerabilities discovered based on
the Common Vulnerability and Exposure (CVE) reports [12].
The trend appears to be slightly decreasing, however, it still                                 is no way of knowing how many undiscovered vulnerabilities
amounts to an average of roughly 250 reports per year.                                         exist and are being actively exploited.
This number is extremely large, considering the years of                                          To protect itself from potential security attacks, the Linux
manpower invested in securing the kernel. In addition, there                                   kernel employs a variety of self-protection mechanisms [10],
                                                                                               [17] such as Kernel Address Space Layout Randomization
   The associate editor coordinating the review of this manuscript and                         (KASLR), Kernel Page Table Isolation (KPTI), stack protec-
approving it for publication was Alba Amato             .                                      tor etc.

                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
134502                                                                                                                                                                VOLUME 10, 2022
C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




   Kernel self-protection mechanisms usually rely on                                 We selected a Linux kernel driver (virtio_net) and ported
enabling specific configuration parameters and adding                             it successfully in the D programming language. The ported
runtime checks to prevent exploitation of code vulnerabilities.                   driver benefited from the safety features of the D program-
Vulnerabilities appear as a combination of programmer                             ming language, improving its security: bounds checking, safe
mistakes and lack of safety support from the programming                          functions, templates. The performance costs were negligible.
language. The Linux kernel is mostly written in C, a fast                            In summary, in this paper we make the following
programming language but with minimal safety features.                            contributions:
C syntax allows easy access to the program memory such as                             •   We demonstrate the feasibility of using a modern pro-
liberal use of pointers, weak typing, no bounds checking for                              gramming language in the Linux kernel by successfully
arrays etc. While these give flexibility to the programmer,                               porting a Linux kernel module to the D programming
they are also the main source of vulnerabilities: buffer                                  language. We ported virtio_net, the network driver of the
overflows, pointers to expired data, pointers to uninitialized                            virtio framework [11].
memory etc.                                                                           •   We design and implement techniques that rely on
   In this paper we propose a complementary approach to                                   specific D language features in order to improve
securing the Linux kernel: the use of a safe programming                                  the Linux kernel drivers. The performance costs are
language, i.e. a language with features that assist the                                   negligible with the security benefits being provided by
developer in writing secure code.                                                         the D programming language.
   Our choice is the D programming language [5], that has a                           •   We provide a methodology for porting Linux kernel
syntax similar to C/C++ and provides modern programming                                   modules to the D programming language. Demonstrated
and safety features. D aims to provide as many of the                                     by our successful port, the methodology can be used to
performance benefits of the C programming language, with                                  port other Linux kernel modules.
as few of the security downsides as possible.
   With the goal of porting a Linux kernel module to                                 The rest of the paper proceeds as follows. Section II
the D programming language, we answer the overarch-                               details the D programming language and Linux kernel
ing research question: Can critical software components                           specifics. Section III presents the methodology employed for
(operating system drivers) be rewritten in a safe program-                        porting Linux kernel modules to D. Section IV presents the
ming language with reasonable effort while maintaining                            concrete steps and challenges in porting the virtio_net Linux
performance?.                                                                     kernel module. Section V evaluates the security benefits and
   Rewriting a software component from an older language                          performance costs for the ported module. Section VI presents
to a newer one offers the possibility to use more modern                          related work. Section VII concludes.
programming features. In our case there are safety benefits
such as: array bounds checking, immutable variables, safe                         II. BACKGROUND
functions, guaranteed initialization. At the same time, the                       A. LINUX KERNEL MODULES
translation process poses multiple challenges. Firstly, each                      Linux source code consists of the kernel proper and a plethora
feature in the initial programming language has to be                             of device drivers and configurable components. Loading a
available in the new programming language; if not, it has to be                   Linux kernel image with all the device drivers included will
adapted. Secondly, the newly rewritten software component                         result in unnecessary memory consumption and an increase
has to be built and linked against the main program: symbol                       of the attack surface. For this, similarly to other modern
names, calling conventions, memory references have to be                          operating systems, Linux uses loadable kernel modules, i.e.
compatible. Thirdly, dependencies of the newly rewritten                          object files that can be added to the kernel at runtime to extend
software component, such as its runtime library, have to be                       its functionality. Kernel modules can be loaded or unloaded
added to the new program or need to be disabled.                                  upon request, without the need to reboot the system or to
   Additionally, the Linux kernel adds its own challenges.                        recompile the kernel.
Certain features such as a standard C library or the use of                          Device drivers are typically implemented as kernel mod-
floating point are missing. Memory allocations are typically                      ules. On a given system, only drivers for its particular set of
resident in the Linux kernel. The stack size is limited.                          hardware devices will be loaded in the kernel. The loading of
   While we also considered Rust and Go as programming                            these specific device drivers usually takes place at startup.
languages for the Linux kernel port, we ultimately chose D.                          Past studies have shown that device drivers host security
Our choice of D was based on three criteria: syntax similarity                    vulnerabilities. Johnson et al. have found that 9 of 11 vul-
to the C programming language, interoperability with C                            nerabilities in the Linux kernel located in device drivers [9].
programs and high performance generated code. D fitted                            An investigation using Parfait, a C/C++ static analyzer, has
these criteria, with its close syntax to C, its reasonably easy                   found that 81% of the bugs are located in device driver
interoperability1 with other languages and its proven track                       code [4]. A two year investigation has revealed that 85% of
record of generating code that is on par with C’s performance.                    Android kernel bugs are found in vendor drivers. As such, this
                                                                                  paper focuses on securing a device driver by porting it to the
  1 The D primary compiler (DMD) has a feature called -betterC                    D programming language. We use the virtio_net Linux kernel
enabling it to build a C program with some D features.                            module as a proof-of-concept of our approach.

VOLUME 10, 2022                                                                                                                             134503
                                                              C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




B. THE D PROGRAMMING LANGUAGE                                          Another important part of the D language are its
D is a general-purpose, statically typed, systems program-          metaprogramming features. Template metaprogramming is
ming language. It has a similar syntax to the C programming         a technique that allows the user to make decisions based
language and it compiles to native code, i.e. it is not             on the template type properties. This technique makes
interpreted nor does it use a virtual machine. D supports both      generic programming even more powerful, allowing generic
automatic and manual memory management: one can rely                types to be more flexible based on the types that they are
on the garbage collector (GC) for memory management or              instantiated with. We have used metaprogramming to employ
directly use the malloc and free functions for manual               compile-time polymorphism inside the Linux kernel in order
allocation and deallocation of memory, similarly to C.              to replace the use and casts to/from void* with concrete
   D is designed as a more feature rich and safe alternative to     types.
the C programming language. It aims to create programs with
comparable performance to those written in C but without             C. INTERFACING C WITH D
the safety issues of it. D provides a set of features aimed at      Regarding interoperability with C, the D programming
reducing the likelihood of memory issues and vulnerabilities        language was designed to match most of the C data types, data
typically found in C programs.                                      structure memory layout and calling convention. Moreover,
   D implements bounds checking for both static and dynamic         the compatibility extends to the format of the object files.
arrays. To address the C design flaw of conflating pointers         D and C use the same application binary interface (ABI)
with arrays and losing the length information, D implements         and the same linkers. D permits access to the C standard
two separate types for pointers and arrays. While the normal        library through bindings in the D runtime library and the
pointers have the same implementation as in C, the arrays           D standard library; similarly, C programs can access D
are implemented as fat pointers: the pointer representation is      functions. Due to name mangling, C functions called in D
extended to a structure that includes length information used       need to be declared with the appropriate linkage attribute
in bounds checking.                                                 (extern “C”); similarly, D functions called in C code are
   In D, the type system is more stringent and void pointers        prepended with the same linkage attribute. This is identical to
are not implicitly converted to other pointer types. Moreover,      the integration of C++ functions in C code and viceversa.
local variables marked with the scope keyword are limited              Linking D code to a C program relies on restricting D
to the function scope, reducing the presence of dangling            objects only to the C standard library. D-generated object
pointers.                                                           files can be linked to C-generated object files by restricting
   Besides common pointers such as those found in C,                D code to a subset that is not reliant on the D runtime library.
D provides a memory-safe option called slices. A slice acts         This is achieved through the -betterC compiler switch
as a ‘‘view’’ of a precise segment of an array. It tracks both      that limits the language to a specific subset that meets the
the pointer and the length of the segment. Instead of referring     foregoing requirement. This subset, called BetterC, results
an array through a pointer that may cause an out of bounds          from removing or altering certain features of the language
memory access, one can use a bounded slice.                         that rely on the runtime library. While some important
   D offers the @safe annotation for functions. This enables        functionalities, such as garbage collection, are removed, most
the compiler to statically check the body of annotated func-        relevant memory-safety features are preserved. Array bounds
tions for instructions that could lead to memory corruption         checking and slicing, metaprogramming facilities, automatic
such as pointer arithmetic and casts. By default, D relies on       initialization of local variables, function safety are part of the
the GC to safely manage the lifetime of objects. Although           BetterC subset.
the GC has proven to aid productivity and memory safety,
its use is incompatible with performance critical or real-time
                                                                     D. INTEGRATING D CODE IN THE LINUX KERNEL
applications such as the Linux kernel.
   As a consequence, an advanced user has the possibility of        We ported virtio_net, the network driver of the virtio
opting out of using the GC and using a different approach           framework.
for lifetime management. Among the possible alternatives               While C and D integration of user space applications is a
are reference counting or the Resource Acquisition Is               well documented process, integrating D code in the Linux
Initialization (RAII) technique. As an alternative to reference     kernel poses its own set of challenges. To the best of our
counting [16], the language maintainers have added support          knowledge, we are the first to have successfully integrated
for an ownership/borrowing system [7] that can be mechani-          a D software component in the Linux kernel.
cally checked, similar to Rust’s borrow checker. At the time           In the next sections we highlight the challenges, methodol-
of this writing, October 2022, D’s ownership system is not on       ogy and outcomes of integrating D code in the Linux kernel.
par with Rust’s, but it is under active development.
   We note that the garbage collector is not involved in any        III. METHODOLOGY FOR PORTING AND ENHANCING
of the safety checks that the compiler employs, apart from          KERNEL MODULES USING D
lifetime management. Array bounds checking, compile time            A. INTRODUCING D CODE IN THE LINUX KERNEL
safety checks and scope analysis are performed even when            There are two ways of adding new functionalities to the Linux
the GC is turned off.                                               kernel: (1) statically linking the new object file directly with

134504                                                                                                                           VOLUME 10, 2022
C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




the core kernel or (2) compiling the code as a loadable module                        •  Conduct the first set of benchmarks: assess the module
and linking it into the kernel on demand.                                                behaviour. Compare the D and the C versions of the
   A general rule of thumb is to add new functionalities as a                            module.
loadable module. This practice has the advantage of keeping                           • Introduce D idiomatic constructs and features into the
the kernel code as clean as possible and is easier to maintain.                          code. Add bounds checking, replace macros and casts
Also, it permits customization to a greater extent, as necessary                         with metaprogramming, add @safe, @trusted and
functionalities can be loaded and unloaded on demand.                                    other useful features.
Moreover, it keeps the Trusted Computing Base (TCB) small                             • Perform the second set of benchmarks: assess the effect
and reduces the overall susceptibility to compromise, thus                               of the idiomatic code added. Compare the idiomatic D
increasing security.                                                                     and the rough D versions of the module. Compare the D
   Regardless of the type of module that has to be built, the                            and the C versions of the module.
kernel build system assumes the source files are written in                           The first step, the porting of data structures, is the most
C. As such, a source file written in another programming                          complex one. In a kernel module, some structures are defined
language won’t successfully compile and the build will fail.                      inside the code of the module, while others come from
This is also the case for the D language. At the same time,                       different header files. To be able to generate an object that can
the module entry point and exit point functions must be in                        pass and receive structures from a C program, a D compiler
C, so that the kernel can reach them. Summarizing, porting a                      (like any other compiler) must know the layout in memory of
module to the D programming language requires:                                    those C structures. This means porting them to D.
                                                                                      This porting can be done using dpp [28], ‘‘a compiler
   •   writing the corresponding source code in D                                 wrapper that will parse a D source file with the .dpp extension
   •   providing module entry points as C interface functions                     and expand in place any #include directives it encounters,
   •   updating the build system files to link the new module                     translating all of the C or C++ symbols to D, and then
                                                                                  pass the result to a D compiler’’. However, a high level of
   For the 2nd requirement, a C interface must be imple-
                                                                                  branching in header files or recursive inclusions may lead
mented between the kernel and the D-written module. This
                                                                                  to the impossibility of using dpp. In this case, one has two
C interface should contain only the entry point functions and
                                                                                  alternatives: (1) port the data structures by hand or (2) make
bindings to macros and functions that can not be ported to D.
                                                                                  dpp work with the Linux kernel headers. We chose the former.
This interface will imply that new features will require at least
                                                                                      Regardless of the porting method, the size and layout of
two source code files: one in C and the ones in D. Therefore
                                                                                  each new structure ported to D should be compared with the
the directives in the Linux kernel build file must be written
                                                                                  size and layout of the original one from C. In the case of
accordingly, for the 3rd requirement.
                                                                                  a size or layout mismatch, the bug can be easily detected
   The kernel build system assumes that it is dealing with C
                                                                                  by comparing the offsets of the fields from D with the
source files and it tries to build the object files accordingly.
                                                                                  offsets of their C counterparts. In D, the offset of a field can
Fortunately, the build system also accepts pre-built object
                                                                                  be obtained using the .offsetof field property. In the Linux
binaries, as dependencies, that it will link with the object files
                                                                                  kernel, it can be obtained using the offsetof(TYPE, MEMBER)
it built in order to create the kernel module. This is done by
                                                                                  macro.
changing the name of the dependency from module-file.o to
                                                                                      A difference to consider is the size of an empty structure:
module-file.o_shipped. To link D object files into a kernel
                                                                                  the C kernel size of an empty structure is 0, while in D
module, the D source files must be compiled beforehand
                                                                                  this kind of structure has the size of 1 byte. We used D’s
and have their name with the suffix .o_shipped. The source
                                                                                  powerful compile-time introspection to solve this issue. Also,
files will be compiled by a D compiler with the -betterC
                                                                                  one should consider the fact that the D language does not
switch. One can choose between using the LLVM-based D
                                                                                  implicitly support bitfields. However, the same functionality
Compiler (LDC) and the GCC-based D Compiler (GDC).
                                                                                  can be achieved using the std.bitmanip.bitfields
After they are compiled independently, they will be shipped
                                                                                  library type.
to the kernel build system to be linked together with the other
                                                                                      While porting the implementation, the D functions called
C objects.
                                                                                  from C must be annotated with the extern(C) linkage attribute.
                                                                                  The attribute instructs the linker to use the C naming and
B. PORTING A KERNEL MODULE                                                        calling convention instead of the D one. The same must be
Porting the kernel module, we followed 5 steps, including                         done when declaring, in the D header, a function that is
testing and benchmarking:                                                         implemented in C.
                                                                                      In D, the non-immutable global variables are placed in
   •   Port the data structures used inside the module. Ensure                    the thread-local storage (TLS), while in C they are placed
       the size and layout of each new ported structure is                        in the global storage. To achieve functional parity, one must
       identical to the size and layout of the original one.                      annotate D global variables with the __gshared attribute.
   •   Port the module implementation one function at a                           Also, the const qualifier is transitive in D, meaning that it
       time. Check module functionality after each new ported                     applies recursively to every subcomponent of the type that
       function.                                                                  it is applied to.

VOLUME 10, 2022                                                                                                                             134505
                                                               C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




   Primitive data type equivalence can be problematic too.           3) STATIC ARRAYS
The equivalence between basic C and D types is described             are by default bounds-checked.
in [6].
   Not all the functionalities that are used or implemented in       4) SLICES
a kernel module are worth to be ported. This is the case of          specify a part of an array, via a reference and length
certain macros, which in their turn call other macros and so         information. They are used to bounds-check dynamically-
on and are very deeply rooted in the kernel code. It is also         allocated arrays. Note that this requires knowledge of the
the case of certain kernel functions that use GCC features           initial size of the dynamically-allocated arrays.
that extend the standard C language and which may not be
implemented in the D compiler. A way to avoid the porting
                                                                     5) TEMPLATES
these macros or functions is to create C bindings (functions
that only call other functions), that can be exposed to a            can be used as replacement for C void pointers and macro
D object and called from there. These bindings should be             definitions for generic programming, thus enabling type
created in the C interface of the module.                            system checks.
   After each new ported function, a functionality test suite
should be run. If bugs were introduced, there is only one            6) SAFE FUNCTIONS
function to debug. The process of porting should be more             (annotated with @safe) are statically verified against cases
syntax-oriented in the first two steps of the methodology.           of undefined behavior. Within safe functions, there are several
One straightforward way of solving syntax related issues             language features that cannot be used, such as casts that break
is to follow and solve the errors that are issued by the             the type system or pointer arithmetic.
compiler. On the other hand, step 4 should be more oriented             Scope, return ref and return scope function parameters
towards functionality and one should use all the features that       are used to ensure that parameters do not escape their scope,
the BetterC subset retains, in order to improve the safety           do not outlive their matching parameter lifetime and are
and the performance of the module. Several techniques for            correctly tracked even through pointer indirections.
enhancing the safety of a module are presented in the next
section.                                                             7) TRUSTED FUNCTIONS
   The benchmarks (steps 3 and 5) should be done according           (annotated with @trusted) provide the same guarantees as
to the module functionality. As a rule of thumb, a benchmark         a safe function, but checks must be done by the programmer.
should be done after the module is ported (step 3) to assess
if the D version of the module can ‘‘keep up’’ with the C            8) SAFE FUNCTIONS
version. Then, one should take into account that memory              can only call other safe functions and trusted functions.
safety features can lead to further performance penalties.
Safety checks are likely to introduce additional overhead. The
                                                                      IV. METHODOLOGY IN ACTION. THE virtio_net DRIVER
second benchmark (step 5) should be done to assess if the
                                                                     Given the steps described above, the goal was to select and
addition of idiomatic code and safety features is worthwhile
                                                                     port a Linux kernel driver from C to D. This was an iterative
from a performance perspective.
                                                                     process with the methodology being updated with feedback
C. SAFETY ENHANCEMENTS
                                                                     from the porting process.
                                                                        To select a target driver we considered the following
These are some of the security enhancements provided by the
                                                                     criteria:
D programming language. They are used to implement and
build the newly implemented kernel module in D.                          •   The driver is in the Linux kernel mainline and it is
                                                                             maintained, so it is relevant for the kernel community.
1) VARIABLES                                                             •   The driver is easy to test and benchmark: being a
are initialized to a default value of their type, removing                   network driver, one can easily send and receive packets
initialization bugs.                                                         and measure what bandwidth is achieved.
                                                                         •   The driver should be medium-sized (thousands of lines
2) IMPLICIT CONVERSIONS                                                      of code). This is a nice trade-off between feature
of void pointers to any other pointer types are not permitted.               complexity and porting effort.
D requires an explicit cast for converting pointers of different        Based on these criteria, we selected the virtio_net
types.                                                               driver, part of the virtio framework [11]. As it’s name
   The C implicit switch fall-through behaviour is not               suggests, it is a virtual network device driver, used as a
permitted in D. D also uses the final switch statement               communication channel between the guest and the hypervisor
where the default case is not required nor permitted, useful         in a paravirtualized environment. It satisfies the three criteria:
when the default statement is useless. The final                     (1) it is actively maintained and used for virtualization use
switch statement is especially useful when it is applied             cases, (2) it can be easily tested with network tools: network
on an enum type, as it will enforce the use of all the enum          functionality and network metrics such as bandwidth and
members in the case statements.                                      latency can be part of a comparison evaluation process and

134506                                                                                                                            VOLUME 10, 2022
C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




(3) it has roughly 3.3k lines of code, fitting into the
medium-size range we wanted.
   The Linux kernel version used, and the compatible driver,
was 4.19.0. For development, testing and evaluation we used
a virtual machine (VM) based on QEMU.

V. EVALUATION
To validate our approach we show that:
   1) The D code has the exact same behavior as the C code
      that it replaces.
   2) The safety mechanisms inserted successfully prevent                         FIGURE 2. VM to VM setup. One VM runs the iperf3 server, the other is
                                                                                  running the client.
      the occurrences of memory corruption bugs.
   3) The performance of the replacement software does not
      degrade with regards to its predecessor.                                       From the total number of array accesses inside the
   We created a setup where we provide both implementations                       virtio_net driver, we were able to enable array bounds
of the virtio_net driver (C and D) and ran similar scenarios to                   checking in 88.4% of the cases. The rest of 11.6% represent
compare functionality, safety and performance.                                    accesses to dynamic arrays that have been allocated outside
                                                                                  of the ported driver. To test the effect of adding array bounds
                                                                                  checking on the driver, we have added artificial out of bounds
A. EXPERIMENTAL SETUP
                                                                                  accesses to the code. In 60% of the cases, the C version of
We created a virtual machine image with the 4.19.0 version of                     the driver has finished execution gracefully, whereas the D
the Linux kernel. The virtual machine is run as two instances:                    version has stopped with a kernel panic in 100% of the cases.
one running the C version of the virtio_net driver, and the
other one running the D version. We refer to the virtual                          2) @SAFE FUNCTIONS
machines using guest and the physical system using host.
                                                                                  To enable the D compiler to check the safety of the code,
  We compiled the D source files of the module using the
                                                                                  we aimed to annotate all the functions present in the
GDC compiler, version 10.3.0, with the following flags:
                                                                                  driver with the @safe keyword. 19% of the functions have
-fno-druntime -mcmodel=kernel -O2 -c.
                                                                                  successfully compiled without any modifications, whereas
  For evaluation we focused on functional correctness /
                                                                                  81.2% have failed compilation due to performing unsafe
parity, safety and performance.
                                                                                  operations. Most of these functions rely on pointer operations
                                                                                  and casts that are forbidden in @safe code. Additional
B. FUNCTIONAL CORRECTNESS                                                         modifications are required to bring the code in a @safe state,
We then run network tools in each virtual machine to check                        however, this can be done incrementally after the initial port
for parity of functionality. For example, using ping to                           of the driver.
validate functionality, using wget to download information
from the Internet. Additionally, we check whether the                             3) TEMPLATES
transferred file is the correct one by comparing its MD5 hash                     D code may use templated functions that are instantiated
with the expected one.                                                            at compile time with the right type. In case of a type
                                                                                  mismatch, that will result in a compilation error, thus making
C. SAFETY                                                                         it impossible to have runtime memory corruption bugs.
To enhance the safety of the ported driver code we modified                       By using templated functions, we replaced 56% of the total
the code as to use several D language features: array bounds                      number of void pointer usages. The remaining 44% could not
checking, @safe functions and templates.                                          be replaced because there was no conversion pattern that we
                                                                                  could detect and leverage for our transformation.
1) ARRAY BOUNDS CHECKING
The virtio driver uses both statically and dynamically                            D. PERFORMANCE
allocated arrays. In the case of static arrays defined inside                     For performance, we used the iperf3 tool that sends packets
the driver, the D language compiler has sufficient information                    between a client and a server. We used a virtual machine
at compile time to insert bounds checking code. Dynamic                           instance running the original C version of the virtio_net driver
arrays, on the other hand, are represented in C as a pointer to                   and a virtual machine running the D version. Each VM was
a chunk of data, therefore there isn’t sufficient information at                  allocated 1GB of RAM and 1 CPU. iperf3 was deployed on
compile time to offer the possibility of implementing runtime                     both VMs.
checks. However, using slices, we are able to enable bounds                          We devised 3 setups:
checking for dynamic arrays that are defined inside the ported                       • vm-to-vm (in Figure 2): One VM is running the server,
driver. Accesses to arrays that are dynamically allocated                              one VM is running the client. Both machines are of the
outside the driver remain without bound checks.                                        same type: either C and either D.

VOLUME 10, 2022                                                                                                                                   134507
                                                                        C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




FIGURE 3. VM to host setup. The host is running the iperf3 server, the VM
is running the client.




                                                                              FIGURE 5. Comparative TCP Performance (C vs D).




FIGURE 4. VM to remote setup. Another system in the host network is
running the iperf3 server, the VM is running the client.

TABLE 1. Comparative performance.                                             FIGURE 6. Comparative UDP Performance (C vs D).



                                                                              we consider performance similar and subject to network and
                                                                              measurement variation.
                                                                                 One thing to note is the relatively reduced impact of
                                                                              the changes: the basic network driver functionalities are
                                                                              unmodified, most of the code responsible for that being
                                                                              shared between the two implementations. Porting other
                                                                              drivers may affect a larger part of the implementation and
                                                                              could feature a higher slowdown. This is subject for analysis
                                                                              in the future.
   • vm-to-host (in Figure 3): The host is running the server,
     the VM is running the client.                                             E. REPLICABILITY
   • vm-to-remote (in Figure 4): Another system in the host                   In the interest of the validating our work, we provide
     network is running the server, the VM is running the                     it to the community on GitHub as a fork of the Linux
     client.                                                                  kernel, an implementation of the D virtio_net driver and
   Each of those setups was used for 2 × 2 types of                           experiment scripts: https://github.com/edi33416/d-virtio.
measurements: (1) the VM is running D or or the VM is                            The implementation of the D virtio_net is on the
running C and (2) iperf3 is using TCP or it is using UDP.                     test_dvirtio_gdc branch, in the drivers/net/
  Results are summarized in Table 1 and in Figure 5 and                       dfiles folder. Alongside the .d source files present in the
Figure 6.                                                                     drivers/net/dfiles path, there are also a Makefile
  Results show negligible overhead for the D module                           and two test.sh files. The Makefile is used to compile
implementation compared to the C implementation. Given                        the .d source files into the .o_shipped objects that, in turn,
that parts of the measurements show a negative slowdown,                      will be linked by the kernel build system to build the

134508                                                                                                                                     VOLUME 10, 2022
C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




virtio_net.ko module. The test.sh and test2.sh                                    kernel. Although the memory safety guarantees that Rust
helper scripts are used to validate the experimental setup.                       offers are superior when compared to D, integrating it in
They load the compiled kernel module, configure the IP                            the Linux kernel is a very complicated task. As evidence,
address and routing table, and validate that the network is                       the work required to add support for Rust in the Linux
working properly; this is done by downloading a file and                          kernel was done by 173 people (present in the commit
comparing its md5sum with the reference value.                                    changelog [21]) over the course of 18 months. This included
   In order to be able to compile the D driver, one needs                         solely the implementation of the infrastructure required to
to install gdc-10, the GCC based D compiler. As we are                            integrate Rust code in the kernel. It does not implement
using QEMU to run the VMs, one also needs to ensure that it                       any device driver or any parts of the Linux kernel in Rust.
has installed qemu-system-x86_64 with KVM support.                                By comparison, our work was done by 3 people over the
We have been connecting to our VMs using a serial port with                       course of 4 months, including the initial exploratory phase of
a serial communication program, such as Minicom.2                                 the Linux infrastructure as well as the porting of the kernel
   Once all the prerequisites are met, one can build the kernel                   header files. The actual porting time of the device driver
module, boot-up the VM and start using the compiled driver.                       required only 2 to 3 weeks. The reader should consider that,
This process is automated in the tools/labs/ directory.                           in the meantime, work has been advanced to automate the
The tools/labs/Makefile, through the run target,                                  porting of kernel header files to D [26], thus reducing the
will (1) compile the .o_shipped object, (2) trigger the                           required time to integrate D device drivers to a minimum.
kernel build system that will result in the virtio_net.ko                         In addition, the effort to integrate Rust in the kernel has
module, (3) download the YOCTO_IMAGE specified in the                             required compiler changes to accommodate the esoteric code
tools/labs/qemu/Makefile and boot-up the VM,                                      encountered, whereas our work does not necessitate any
and (4) copy the module inside the VM. It will also setup                         compiler changes.
IP forwarding and NAT Masquerading for the eno1 network                              A previous attempt to create a memory-safe version of the
interface on the host machine, so one must update the                             C language and to use it into the Linux kernel is CCured [2].
Makefile if one’s system is using a different network interface                   CCured is a program transformation system that extends the
name.                                                                             existing type system of the C language by classifying new
   Once the VM has booted, one can connect to it through the                      pointer types according to their usage. There are three pointer
serial1.pts serial pipe with the help of the minicom                              categories: (1) SAFE qualified pointers may be dereferenced,
utility tool, as such minicom -D serial1.pts. The                                 but cannot be cast to other types or be used as part of
default login username is root and requires no password. All                      pointer arithmetic operations, (2) SEQ qualified pointers may
the files can be found in the skels/ directory inside the VM,                     be used as part of pointer arithmetic, but not in type casts
the kernel object being named virtio_net_tmp.ko.                                  and (3) WILD qualified pointers that can be cast to other
Precompiled .ko objects can be found in the Releases3 on                          pointer types. Each category is treated separately at runtime.
Github:                                                                           SAFE pointers simply require a null check. SEQ pointers are
   • Precompiled D .ko: https://github.com/edi33416/d-                            subjected to bounds checking, since they are typically used
      virtio/releases/download/dvirtio-ko/virtio_net_tmp.ko                       for array operations. WILD pointers are the most expensive
   • Precompiled C .ko: https://github.com/edi33416/d-                            in terms of runtime cost, because they require runtime type
      virtio/releases/download/cvirtio-ko/virtio_net_tmp.ko                       information to track the various conversion types that the
   It is our hope that the availability of our work will make it                  pointer may be subjected to. It has been previously discovered
easier to evaluate, to replicate and to provide a critical eye on.                that, in practice, a large percentage of the casts in C codebases
                                                                                  between different types are either upcasts or downcasts [23].
VI. RELATED WORK                                                                  This is also true for the Linux kernel where void* is used as
Improving the safety of the Linux kernel and its drivers is                       a generic base type in order to enable polymorphism. These
the constant focus of the professional and research security                      types of casts will be treated as WILD pointers by CCured
community. There are different approaches ranging from                            which will be subjected to the costs of runtime checks.
static analysis of the Linux kernel code [4], [9], [14] to                        By using D, we were able to leverage it’s metaprogramming
fuzzing [3], [8], [22], [25] to the use of runtime checks and/or                  support in order to achieve compile-time polymorphism and
instrumentation [13], [24].                                                       type safety without adding any runtime costs.
   The idea of using programming languages that implement                            The pointers defined by CCured are fat pointers: a structure
different memory safety features in order to make the Linux                       that packs together the raw pointer and metadata related to the
kernel code safer has also been tackled.                                          boundaries and type information. The authors acknowledge
   The recent availability of Rust as a programming language                      [15] that, because of this, multithreaded programs that rely
in the Linux kernel [19], [20] paves the way for adding code                      on shared memory will not work with CCured. The isssue
written in a secure programming language. This is compatible                      with shared memory programs stems from the fact that the
with our own approach of using D to write code in the Linux                       programs not written using CCured will assume that the
                                                                                  pointers are one word long and can be written to atomically,
  2 https://wiki.emacinc.com/wiki/Getting_Started_With_Minicom                    when they are, in fact, a fat pointer that occupies multiple
  3 https://github.com/edi33416/d-virtio/releases                                 words in memory and requires multiple instructions in order

VOLUME 10, 2022                                                                                                                             134509
                                                                C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




to perform the write, and thus the pointer could get in                   It is important to note that unsafety inside the kernel is a
an inconsistent state. As D’s arrays are also fat pointers,           fact of life. Although one can use a programming language
they suffer from the same problem. We, as do the authors              that uses different mechanics that increase the safety of the
of CCured, believe that this problem can be resolved by               code that a developer writes, at one point the developer will
acquiring locks on the shared memory before accessing it.             be forced to perform unsafe actions. Those can come from the
Although, in theory, this solution will impact performance we         need to interact with specific pins on the underlying hardware
have not encountered it in practice while interfacing D with          or the need to interact with the kernel API. Most of the kernel
programs written in other languages.                                  API core works with raw pointers; as such, even though the
   CCured was used on two Linux kernel device drivers,                safe code might implement a sound object lifetime algorithm,
on Linux kernel version 2.4.5, with no significant per-               being forced to pass the raw pointer to the kernel will void all
formance penalties. However, it has incurred performance              the safety bets and assumptions. In spite of this, we believe
penalties ranging from 11% to 87% on other programs, as it            that there are two strong arguments that enable the use of
is detailed in its paper.                                             safe languages in practice: 1) the kernel core is extremely
   Another approach to use a modern programming language              stable and robust as it benefits from 30 years of development
for the Linux kernel drivers, in order to increase the                and bug fixes, and 2) the kernel API clearly defines whose
reliability of the system, was done using the Decaf drivers           responsibility, the kernel’s or the driver’s, is to free allocated
architecture [18]. The Decaf architecture partitions the              resources.
code of a driver in two separate parts: one that must                     Another important observation is that a programming
run in the kernel-space for high performance and must                 language must be able to adhere to the constraints and design
satisfy the OS requirements and one that can be moved to              patterns implemented inside the Linux kernel. As Linus Tor-
the user-space and be rewritten in another language. The              valds has stated [30], kernel needs to trump any programming
communication between these two parts was done through                language’s needs. For this reason, we believe that the D
extension procedure call (XPC). Using this architecture and           programming language is a good fit given its proven ease of
the Linux kernel version 2.6.18.1, five drivers were converted        interoperability with C and the kernel infrastructure.
to Java, gaining exception handling and automatic memory                  The extent to which the kernel safety can be improved
management through garbage collection. The performance                depends on the degree to which the module implementation is
achieved was close to the one achieved by the native kernel           self-sufficient. The more external functionalities the module
drivers. The drawbacks of using Decaf result are traced to            uses, the fewer safety enhancements can be done.
the Java programming language, that has no pointers support.              The performance evaluation we conducted on the vir-
As such, critical paths in the code that use pointers are left in     tio_net driver shows that the D version of the driver adds
the unsafe part, still running in kernel space.                       little to no overhead to the original C variant. The safety
   Conversely, our methodology covers the use of the D                features added are sustainable and do not introduce overhead,
language for memory safety enhancements in any type of                therefore, we consider the performance results encouraging.
kernel modules, including those that use multithreading                   Given the methodology we created, we are confident other
and shared memory, as is the case with CCured [2]. The                drivers could be ported to D with reasonable effort. Given
implementation of new components and the interfacing with             the similarity to the C programming language, getting accus-
other kernel components can be easily done thanks to the              tomed to the D programming language will have minimal
language’s high compatibility with C, compared to the more            impact on the driver developer. This is in contrast to the Rust
complicated syntax of Rust. The entire code of a kernel               programming language whose syntax and features are very
module can be rewritten in D to improve memory safety, with           different from the C programming language. We believe that
no need of leaving any part of the code unchanged, as is the          the increasing interest of adding safe languages into the Linux
case with Decaf [18].                                                 kernel is a great step forward, as it provides kernel developers
                                                                      with alternatives and flexibility such that they can strike the
VII. CONCLUSION                                                       right balance for their needs and goals.
In this paper we presented an approach to improve the                     With these solutions, further drivers could be ported using
security of Linux kernel modules using the D programming              the methodology described in this paper. Later on, this could
language. We selected virtio_net as our target driver,                be extended to entire built-in components and subsystems
a medium-sized and actively maintained component in the               in the Linux kernel. Those would bring a much needed
Linux kernel. We ported the driver in the D programming               improvement in the overall security of the kernel with close-
language and highlighted the functional and performance               to-no overhead, with a welcoming C-similar programming
parity to the original C driver and discussed the security            language.
benefits. We elaborated a methodology that can be used on
other types of drivers for the same purpose.                           REFERENCES
   The safety features added to the driver show that the
D language is able to leverage safety improvements in a                 [1] AspenCore. (Nov. 2019). Mobile Operating System Market Share
                                                                            Worldwide. Accessed: Apr. 17, 2022. [Online]. Available: https://www.
kernel module, array bounds checking and compile-time                       embedded.com/wp-content/uploads/2019/11/EETimes
polymorphism being the most important ones.                                 _Embedded_2019_Embedded_Markets_Study.pdf

134510                                                                                                                             VOLUME 10, 2022
C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language




 [2] J. Condit, M. Harren, S. McPeak, G. C. Necula, and W. Weimer, ‘‘Cured          [27] Mobile Operating System Market Share Worldwide. Accessed:
     in the real world,’’ ACM SIGPLAN Notices, vol. 38, no. 5, pp. 232–244,              Apr. 17, 2022. [Online]. Available: https://gs.statcounter.com/os-market-
     2003.                                                                               share/mobile/worldwide
 [3] J. Corina, A. Machiry, C. Salls, Y. Shoshitaishvili, S. Hao, C. Kruegel, and   [28] Project Highlight: DPP. Accessed: Apr. 17, 2022. [Online]. Available:
     G. Vigna, ‘‘DIFUZE: Interface aware fuzzing for kernel drivers,’’ in Proc.          https://dlang.org/blog/2019/04/08/project-highlight-dpp/
     ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2017, pp. 2123–2138.             [29] Operating System Family/Linux | TOP50. Accessed: Apr. 17, 2022.
 [4] D. Dawson, N. Hawes, C. Hoermann, N. Keynes, and C. Cifuentes,                      [Online]. Available: https://www.top500.org/statistics/details/osfam/1/
     ‘‘Finding bugs in open source kernels using parfait,’’ Sun Microsyst.          [30] LKML: Linus Torvalds: Re: [Patch v9 12/27] Rust: Add Kernel
     Lab., Brisbane, QLD, Australia, Tech. Rep., 2009. [Online]. Available:              Crate. Accessed: Oct. 29, 2022. [Online]. Available: https://lkml.org/
     https://www.researchgate.net/publication/242083507_Finding_Bugs_in_                 lkml/2022/9/19/1105
     Open_Source_Kernels_using_Parfait                                              [31] W3Techs. Linux vs. Windows Usage Statistics for Websites.
 [5] D Programming Language. Accessed: Apr. 17, 2022. [Online]. Available:               Accessed: Apr. 17, 2022. [Online]. Available: https://w3techs.com/
     https://dlang.org/                                                                  technologies/comparison/os-linux,os-windows
 [6] Programming in D for C Programmers. Accessed: Apr. 17, 2022. [Online].
     Available: https://dlang.org/articles/ctod.html
 [7] Live Functions: Ownership and Borrowing in D. Accessed: Oct. 29, 2022.
     [Online]. Available: https://dlang.org/spec/ob.html                                                       CONSTANTIN EDUARD STANILOIU received
 [8] D. R. Jeong, K. Kim, B. Shivakumar, B. Lee, and I. Shin, ‘‘Razzer: Finding                                the B.Sc. and M.Sc. degrees in computer
     kernel race bugs through fuzzing,’’ in Proc. IEEE Symp. Secur. Privacy                                    science and engineering from the University
     (SP), May 2019, pp. 754–768.                                                                              POLITEHNICA of Bucharest (UPB), Bucharest,
 [9] R. Johnson and D. Wagner, ‘‘Finding user/kernel pointer bugs with type                                    Romania, in 2016 and 2018, respectively, where he
     inference,’’ in Proc. 13th USENIX Secur. Symp. (USENIX Security).                                         is currently pursuing the Ph.D. degree in computer
     San Diego, CA, USA: USENIX Association, Aug. 2004, pp. 119–134.                                           science and information security.
[10] Kernel Self-Protection. Accessed: Apr. 17, 2022. [Online]. Available:
                                                                                                                  Since 2018, he has been a Teaching Assistant
     https://www.kernel.org/doc/html/latest/security/self-protection.html
                                                                                                               with the Department of Computer, Faculty of
[11] (2022). Virtio. Accessed: Apr. 17, 2022. [Online]. Available:
     https://wiki.libvirt.org/page/Virtio                                                                      Automatic Control and Computers, UPB. He is
[12] Linux Kernel CVEs. Accessed: Apr. 17, 2022. [Online]. Available:               also a member of the Secure Systems Group, Department of Computer. His
     https://www.linuxkernelcves.com/                                               research interests include programming languages, security and vulnerability
[13] K. Lu, A. Pakki, and Q. Wu, ‘‘Automatically identifying security checks for    detection, code and binary analysis, distributed systems, the IoT, and
     detecting kernel semantic bugs,’’ in Proc. Eur. Symp. Res. Comput. Secur.      computer vision.
     Luxembourg: Springer, 2019, pp. 3–25.
[14] A. Machiry, C. Spensky, J. Corina, N. Stephens, C. Kruegel, and G. Vigna,
     ‘‘DR.CHECKER: A soundy analysis for Linux kernel drivers,’’ in Proc.
     26th USENIX Secur. Symp. (USENIX Security), 2017, pp. 1007–1024.                                          ALEXANDRU MILITARU received the B.Sc. and
[15] G. C. Necula, J. Condit, M. Harren, S. McPeak, and W. Weimer, ‘‘CCured:                                   M.Sc. degrees in computer science and engi-
     Type-safe retrofitting of legacy software,’’ ACM Trans. Program. Lang.                                    neering from the University POLITEHNICA of
     Syst., vol. 27, no. 3, pp. 477–526, May 2005.                                                             Bucharest (UPB). He is currently pursuing the
[16] R. Nitu, E. Staniloiu, R. Deaconescu, and R. Rughinis, ‘‘Adding support                                   M.A. degree in philosophy with the University of
     for reference counting in the d programming language,’’ in Proc. 17th Int.                                Bucharest, Bucharest, Romania.
     Conf. Softw. Technol., H.-G. Fill, M. van Sinderen, and L. A. Maciaszek,                                     His research interests include programming
     Eds. Lisbon, Portugal: SCITEPRESS, 2022, pp. 299–306.                                                     languages and compilers.
[17] A. Popov. Linux Kernel Defence Map. Accessed: Apr. 17, 2022. [Online].
     Available: https://github.com/a13xp0p0v/linux-kernel-defence-map
[18] M. J. Renzelmann and M. M. Swift, ‘‘Decaf: Moving device drivers
     to a modern language,’’ in Proc. USENIX Annu. Tech. Conf., 2009,
     p. 14. [Online]. Available: https://www.researchgate.net/publication/
                                                                                                               RAZVAN NITU received the B.Sc. and M.Sc.
     234787227_Decaf_moving_device_drivers_to_a_modern_language/
                                                                                                               degrees in computer science and engineering
     citation/download
[19] Rust in the Linux Kernel: Good Enough. Accessed: Apr. 17, 2022. [Online].                                 from the University POLITEHNICA of Bucharest
     Available: https://thenewstack.io/rust-in-the-linux-kernel-good-enough/                                   (UPB), Bucharest, Romania, where he is currently
[20] Rust for Linux. Accessed: Apr. 17, 2022. [Online]. Available:                                             pursuing the Ph.D. degree in programming lan-
     https://github.com/Rust-for-Linux                                                                         guages and security. His research interests include
[21] Linux Kernel Commit to Add Rust Support. Accessed:                                                        programming languages, security, computer archi-
     Oct. 22, 2022. [Online]. Available: https://git.kernel.org/pub/scm/linux/                                 tecture, and education techniques.
     kernel/git/torvalds/linux.git/commit/?id=8aebac82933ff1a7c8eede18cab
     11e1115e2062b
[22] S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. Holz,
     ‘‘kAFL: Hardware-assisted feedback fuzzing for OS kernels,’’ in Proc.
     26th USENIX Secur. Symp. (USENIX Security), 2017, pp. 167–182.                                              RAZVAN DEACONESCU is currently an Asso-
[23] M. Siff, S. Chandra, T. Ball, K. Kunchithapadam, and T. Reps, ‘‘Coping                                      ciate Professor at the Computer Science and Engi-
     with type casts in C,’’ ACM SIGSOFT Softw. Eng. Notes, vol. 24, no. 6,                                      neering Department, University POLITEHNICA
     pp. 180–198, Nov. 1999.                                                                                     of Bucharest, Romania. His research interests
[24] C. Song, B. Lee, K. Lu, W. Harris, T. Kim, and W. Lee, ‘‘Enforcing
                                                                                                                 include operating systems and security, with a
     kernel security invariants with data flow integrity,’’ in Proc. NDSS, 2016,
                                                                                                                 penchant for teaching and mentoring. If a class
     pp. 1–15.
[25] D. Song, F. Hetzelt, J. Kim, B. B. Kang, J.-P. Seifert, and M. Franz,                                       uses ‘‘operating systems’’ as part of its name, it’s
     ‘‘Agamotto: Accelerating kernel driver fuzzing with lightweight virtual                                     likely he is part of the team. Research-wise, he is
     machine checkpoints,’’ in Proc. 29th USENIX Secur. Symp. (USENIX                                            working on software security, particularly Apple
     Security), 2020, pp. 2541–2557.                                                                             iOS security and the Unikraft unikernel in recent
[26] E. Staniloiu, R. Nitu, C. Becerescu, and R. Rughinis, ‘‘Automatic              years. He is a part of the open source and security community in the university
     integration of d code with the Linux kernel,’’ in Proc. 20th RoEduNet          and in Romania.
     Conference: Netw. Educ. Res. (RoEduNet), Nov. 2021, pp. 1–6.



VOLUME 10, 2022                                                                                                                                             134511