Authors Alexandru Militaru Constantin Eduard Staniloiu Razvan Deaconescu Razvan Nitu
License CC-BY-4.0
Received 1 November 2022, accepted 9 December 2022, date of publication 15 December 2022, date of current version 29 December 2022. Digital Object Identifier 10.1109/ACCESS.2022.3229461 Safer Linux Kernel Modules Using the D Programming Language CONSTANTIN EDUARD STANILOIU , ALEXANDRU MILITARU, RAZVAN NITU , AND RAZVAN DEACONESCU Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania Corresponding author: Razvan Nitu (razvan.nitu1305@upb.ro) ABSTRACT Since its creation, the Linux kernel has gained international recognition and has been employed on a large range of devices: servers, supercomputers, smart devices and embedded systems. Given its popularity, the security of the kernel has become a critical research topic. As a consequence, a wide range of third party tools were created to detect bugs in its implementation. However, new vulnerabilities are discovered and exploited every year. The explanation for this phenomenon lies in the fact that the programming language that is used for the kernel implementation, C, is designed to allow unsafe memory operations. In this paper, we show that it is possible to incrementally transition the kernel code from C to a memory safe programming language, D, by porting and integrating a device driver. In addition, we propose a series of code transformations that allow the D compiler to reason about the safety of certain memory operations. Our implementation increases the security guarantees of the kernel without incurring any performance penalties. INDEX TERMS Memory safety, Linux kernel, driver development, security, D programming language. I. INTRODUCTION One of the most popular operating system kernels, Linux, is used on a wide range of hardware, from supercomputers to IoT devices. While Microsoft Windows dominates the desk- top market, Linux is the most popular operating system used by supercomputers [29], in the server market [31], handheld devices, as part of the Android operating system [27] and the embedded world [1]. Like all operating system kernels, Linux runs in a privileged processor mode (called kernel mode or supervisor mode) with complete access to system memory and devices. A successful attack on Linux will provide the attacker full control of the entire system, making it a sought after target. Such attacks represent a common occurrence. Figure 1 FIGURE 1. Number of Common Vulnerability and Exposure (CVE) reports. highlights the number of vulnerabilities discovered based on the Common Vulnerability and Exposure (CVE) reports [12]. The trend appears to be slightly decreasing, however, it still is no way of knowing how many undiscovered vulnerabilities amounts to an average of roughly 250 reports per year. exist and are being actively exploited. This number is extremely large, considering the years of To protect itself from potential security attacks, the Linux manpower invested in securing the kernel. In addition, there kernel employs a variety of self-protection mechanisms [10], [17] such as Kernel Address Space Layout Randomization The associate editor coordinating the review of this manuscript and (KASLR), Kernel Page Table Isolation (KPTI), stack protec- approving it for publication was Alba Amato . tor etc. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 134502 VOLUME 10, 2022 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language Kernel self-protection mechanisms usually rely on We selected a Linux kernel driver (virtio_net) and ported enabling specific configuration parameters and adding it successfully in the D programming language. The ported runtime checks to prevent exploitation of code vulnerabilities. driver benefited from the safety features of the D program- Vulnerabilities appear as a combination of programmer ming language, improving its security: bounds checking, safe mistakes and lack of safety support from the programming functions, templates. The performance costs were negligible. language. The Linux kernel is mostly written in C, a fast In summary, in this paper we make the following programming language but with minimal safety features. contributions: C syntax allows easy access to the program memory such as • We demonstrate the feasibility of using a modern pro- liberal use of pointers, weak typing, no bounds checking for gramming language in the Linux kernel by successfully arrays etc. While these give flexibility to the programmer, porting a Linux kernel module to the D programming they are also the main source of vulnerabilities: buffer language. We ported virtio_net, the network driver of the overflows, pointers to expired data, pointers to uninitialized virtio framework [11]. memory etc. • We design and implement techniques that rely on In this paper we propose a complementary approach to specific D language features in order to improve securing the Linux kernel: the use of a safe programming the Linux kernel drivers. The performance costs are language, i.e. a language with features that assist the negligible with the security benefits being provided by developer in writing secure code. the D programming language. Our choice is the D programming language [5], that has a • We provide a methodology for porting Linux kernel syntax similar to C/C++ and provides modern programming modules to the D programming language. Demonstrated and safety features. D aims to provide as many of the by our successful port, the methodology can be used to performance benefits of the C programming language, with port other Linux kernel modules. as few of the security downsides as possible. With the goal of porting a Linux kernel module to The rest of the paper proceeds as follows. Section II the D programming language, we answer the overarch- details the D programming language and Linux kernel ing research question: Can critical software components specifics. Section III presents the methodology employed for (operating system drivers) be rewritten in a safe program- porting Linux kernel modules to D. Section IV presents the ming language with reasonable effort while maintaining concrete steps and challenges in porting the virtio_net Linux performance?. kernel module. Section V evaluates the security benefits and Rewriting a software component from an older language performance costs for the ported module. Section VI presents to a newer one offers the possibility to use more modern related work. Section VII concludes. programming features. In our case there are safety benefits such as: array bounds checking, immutable variables, safe II. BACKGROUND functions, guaranteed initialization. At the same time, the A. LINUX KERNEL MODULES translation process poses multiple challenges. Firstly, each Linux source code consists of the kernel proper and a plethora feature in the initial programming language has to be of device drivers and configurable components. Loading a available in the new programming language; if not, it has to be Linux kernel image with all the device drivers included will adapted. Secondly, the newly rewritten software component result in unnecessary memory consumption and an increase has to be built and linked against the main program: symbol of the attack surface. For this, similarly to other modern names, calling conventions, memory references have to be operating systems, Linux uses loadable kernel modules, i.e. compatible. Thirdly, dependencies of the newly rewritten object files that can be added to the kernel at runtime to extend software component, such as its runtime library, have to be its functionality. Kernel modules can be loaded or unloaded added to the new program or need to be disabled. upon request, without the need to reboot the system or to Additionally, the Linux kernel adds its own challenges. recompile the kernel. Certain features such as a standard C library or the use of Device drivers are typically implemented as kernel mod- floating point are missing. Memory allocations are typically ules. On a given system, only drivers for its particular set of resident in the Linux kernel. The stack size is limited. hardware devices will be loaded in the kernel. The loading of While we also considered Rust and Go as programming these specific device drivers usually takes place at startup. languages for the Linux kernel port, we ultimately chose D. Past studies have shown that device drivers host security Our choice of D was based on three criteria: syntax similarity vulnerabilities. Johnson et al. have found that 9 of 11 vul- to the C programming language, interoperability with C nerabilities in the Linux kernel located in device drivers [9]. programs and high performance generated code. D fitted An investigation using Parfait, a C/C++ static analyzer, has these criteria, with its close syntax to C, its reasonably easy found that 81% of the bugs are located in device driver interoperability1 with other languages and its proven track code [4]. A two year investigation has revealed that 85% of record of generating code that is on par with C’s performance. Android kernel bugs are found in vendor drivers. As such, this paper focuses on securing a device driver by porting it to the 1 The D primary compiler (DMD) has a feature called -betterC D programming language. We use the virtio_net Linux kernel enabling it to build a C program with some D features. module as a proof-of-concept of our approach. VOLUME 10, 2022 134503 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language B. THE D PROGRAMMING LANGUAGE Another important part of the D language are its D is a general-purpose, statically typed, systems program- metaprogramming features. Template metaprogramming is ming language. It has a similar syntax to the C programming a technique that allows the user to make decisions based language and it compiles to native code, i.e. it is not on the template type properties. This technique makes interpreted nor does it use a virtual machine. D supports both generic programming even more powerful, allowing generic automatic and manual memory management: one can rely types to be more flexible based on the types that they are on the garbage collector (GC) for memory management or instantiated with. We have used metaprogramming to employ directly use the malloc and free functions for manual compile-time polymorphism inside the Linux kernel in order allocation and deallocation of memory, similarly to C. to replace the use and casts to/from void* with concrete D is designed as a more feature rich and safe alternative to types. the C programming language. It aims to create programs with comparable performance to those written in C but without C. INTERFACING C WITH D the safety issues of it. D provides a set of features aimed at Regarding interoperability with C, the D programming reducing the likelihood of memory issues and vulnerabilities language was designed to match most of the C data types, data typically found in C programs. structure memory layout and calling convention. Moreover, D implements bounds checking for both static and dynamic the compatibility extends to the format of the object files. arrays. To address the C design flaw of conflating pointers D and C use the same application binary interface (ABI) with arrays and losing the length information, D implements and the same linkers. D permits access to the C standard two separate types for pointers and arrays. While the normal library through bindings in the D runtime library and the pointers have the same implementation as in C, the arrays D standard library; similarly, C programs can access D are implemented as fat pointers: the pointer representation is functions. Due to name mangling, C functions called in D extended to a structure that includes length information used need to be declared with the appropriate linkage attribute in bounds checking. (extern “C”); similarly, D functions called in C code are In D, the type system is more stringent and void pointers prepended with the same linkage attribute. This is identical to are not implicitly converted to other pointer types. Moreover, the integration of C++ functions in C code and viceversa. local variables marked with the scope keyword are limited Linking D code to a C program relies on restricting D to the function scope, reducing the presence of dangling objects only to the C standard library. D-generated object pointers. files can be linked to C-generated object files by restricting Besides common pointers such as those found in C, D code to a subset that is not reliant on the D runtime library. D provides a memory-safe option called slices. A slice acts This is achieved through the -betterC compiler switch as a ‘‘view’’ of a precise segment of an array. It tracks both that limits the language to a specific subset that meets the the pointer and the length of the segment. Instead of referring foregoing requirement. This subset, called BetterC, results an array through a pointer that may cause an out of bounds from removing or altering certain features of the language memory access, one can use a bounded slice. that rely on the runtime library. While some important D offers the @safe annotation for functions. This enables functionalities, such as garbage collection, are removed, most the compiler to statically check the body of annotated func- relevant memory-safety features are preserved. Array bounds tions for instructions that could lead to memory corruption checking and slicing, metaprogramming facilities, automatic such as pointer arithmetic and casts. By default, D relies on initialization of local variables, function safety are part of the the GC to safely manage the lifetime of objects. Although BetterC subset. the GC has proven to aid productivity and memory safety, its use is incompatible with performance critical or real-time D. INTEGRATING D CODE IN THE LINUX KERNEL applications such as the Linux kernel. As a consequence, an advanced user has the possibility of We ported virtio_net, the network driver of the virtio opting out of using the GC and using a different approach framework. for lifetime management. Among the possible alternatives While C and D integration of user space applications is a are reference counting or the Resource Acquisition Is well documented process, integrating D code in the Linux Initialization (RAII) technique. As an alternative to reference kernel poses its own set of challenges. To the best of our counting [16], the language maintainers have added support knowledge, we are the first to have successfully integrated for an ownership/borrowing system [7] that can be mechani- a D software component in the Linux kernel. cally checked, similar to Rust’s borrow checker. At the time In the next sections we highlight the challenges, methodol- of this writing, October 2022, D’s ownership system is not on ogy and outcomes of integrating D code in the Linux kernel. par with Rust’s, but it is under active development. We note that the garbage collector is not involved in any III. METHODOLOGY FOR PORTING AND ENHANCING of the safety checks that the compiler employs, apart from KERNEL MODULES USING D lifetime management. Array bounds checking, compile time A. INTRODUCING D CODE IN THE LINUX KERNEL safety checks and scope analysis are performed even when There are two ways of adding new functionalities to the Linux the GC is turned off. kernel: (1) statically linking the new object file directly with 134504 VOLUME 10, 2022 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language the core kernel or (2) compiling the code as a loadable module • Conduct the first set of benchmarks: assess the module and linking it into the kernel on demand. behaviour. Compare the D and the C versions of the A general rule of thumb is to add new functionalities as a module. loadable module. This practice has the advantage of keeping • Introduce D idiomatic constructs and features into the the kernel code as clean as possible and is easier to maintain. code. Add bounds checking, replace macros and casts Also, it permits customization to a greater extent, as necessary with metaprogramming, add @safe, @trusted and functionalities can be loaded and unloaded on demand. other useful features. Moreover, it keeps the Trusted Computing Base (TCB) small • Perform the second set of benchmarks: assess the effect and reduces the overall susceptibility to compromise, thus of the idiomatic code added. Compare the idiomatic D increasing security. and the rough D versions of the module. Compare the D Regardless of the type of module that has to be built, the and the C versions of the module. kernel build system assumes the source files are written in The first step, the porting of data structures, is the most C. As such, a source file written in another programming complex one. In a kernel module, some structures are defined language won’t successfully compile and the build will fail. inside the code of the module, while others come from This is also the case for the D language. At the same time, different header files. To be able to generate an object that can the module entry point and exit point functions must be in pass and receive structures from a C program, a D compiler C, so that the kernel can reach them. Summarizing, porting a (like any other compiler) must know the layout in memory of module to the D programming language requires: those C structures. This means porting them to D. This porting can be done using dpp [28], ‘‘a compiler • writing the corresponding source code in D wrapper that will parse a D source file with the .dpp extension • providing module entry points as C interface functions and expand in place any #include directives it encounters, • updating the build system files to link the new module translating all of the C or C++ symbols to D, and then pass the result to a D compiler’’. However, a high level of For the 2nd requirement, a C interface must be imple- branching in header files or recursive inclusions may lead mented between the kernel and the D-written module. This to the impossibility of using dpp. In this case, one has two C interface should contain only the entry point functions and alternatives: (1) port the data structures by hand or (2) make bindings to macros and functions that can not be ported to D. dpp work with the Linux kernel headers. We chose the former. This interface will imply that new features will require at least Regardless of the porting method, the size and layout of two source code files: one in C and the ones in D. Therefore each new structure ported to D should be compared with the the directives in the Linux kernel build file must be written size and layout of the original one from C. In the case of accordingly, for the 3rd requirement. a size or layout mismatch, the bug can be easily detected The kernel build system assumes that it is dealing with C by comparing the offsets of the fields from D with the source files and it tries to build the object files accordingly. offsets of their C counterparts. In D, the offset of a field can Fortunately, the build system also accepts pre-built object be obtained using the .offsetof field property. In the Linux binaries, as dependencies, that it will link with the object files kernel, it can be obtained using the offsetof(TYPE, MEMBER) it built in order to create the kernel module. This is done by macro. changing the name of the dependency from module-file.o to A difference to consider is the size of an empty structure: module-file.o_shipped. To link D object files into a kernel the C kernel size of an empty structure is 0, while in D module, the D source files must be compiled beforehand this kind of structure has the size of 1 byte. We used D’s and have their name with the suffix .o_shipped. The source powerful compile-time introspection to solve this issue. Also, files will be compiled by a D compiler with the -betterC one should consider the fact that the D language does not switch. One can choose between using the LLVM-based D implicitly support bitfields. However, the same functionality Compiler (LDC) and the GCC-based D Compiler (GDC). can be achieved using the std.bitmanip.bitfields After they are compiled independently, they will be shipped library type. to the kernel build system to be linked together with the other While porting the implementation, the D functions called C objects. from C must be annotated with the extern(C) linkage attribute. The attribute instructs the linker to use the C naming and B. PORTING A KERNEL MODULE calling convention instead of the D one. The same must be Porting the kernel module, we followed 5 steps, including done when declaring, in the D header, a function that is testing and benchmarking: implemented in C. In D, the non-immutable global variables are placed in • Port the data structures used inside the module. Ensure the thread-local storage (TLS), while in C they are placed the size and layout of each new ported structure is in the global storage. To achieve functional parity, one must identical to the size and layout of the original one. annotate D global variables with the __gshared attribute. • Port the module implementation one function at a Also, the const qualifier is transitive in D, meaning that it time. Check module functionality after each new ported applies recursively to every subcomponent of the type that function. it is applied to. VOLUME 10, 2022 134505 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language Primitive data type equivalence can be problematic too. 3) STATIC ARRAYS The equivalence between basic C and D types is described are by default bounds-checked. in [6]. Not all the functionalities that are used or implemented in 4) SLICES a kernel module are worth to be ported. This is the case of specify a part of an array, via a reference and length certain macros, which in their turn call other macros and so information. They are used to bounds-check dynamically- on and are very deeply rooted in the kernel code. It is also allocated arrays. Note that this requires knowledge of the the case of certain kernel functions that use GCC features initial size of the dynamically-allocated arrays. that extend the standard C language and which may not be implemented in the D compiler. A way to avoid the porting 5) TEMPLATES these macros or functions is to create C bindings (functions that only call other functions), that can be exposed to a can be used as replacement for C void pointers and macro D object and called from there. These bindings should be definitions for generic programming, thus enabling type created in the C interface of the module. system checks. After each new ported function, a functionality test suite should be run. If bugs were introduced, there is only one 6) SAFE FUNCTIONS function to debug. The process of porting should be more (annotated with @safe) are statically verified against cases syntax-oriented in the first two steps of the methodology. of undefined behavior. Within safe functions, there are several One straightforward way of solving syntax related issues language features that cannot be used, such as casts that break is to follow and solve the errors that are issued by the the type system or pointer arithmetic. compiler. On the other hand, step 4 should be more oriented Scope, return ref and return scope function parameters towards functionality and one should use all the features that are used to ensure that parameters do not escape their scope, the BetterC subset retains, in order to improve the safety do not outlive their matching parameter lifetime and are and the performance of the module. Several techniques for correctly tracked even through pointer indirections. enhancing the safety of a module are presented in the next section. 7) TRUSTED FUNCTIONS The benchmarks (steps 3 and 5) should be done according (annotated with @trusted) provide the same guarantees as to the module functionality. As a rule of thumb, a benchmark a safe function, but checks must be done by the programmer. should be done after the module is ported (step 3) to assess if the D version of the module can ‘‘keep up’’ with the C 8) SAFE FUNCTIONS version. Then, one should take into account that memory can only call other safe functions and trusted functions. safety features can lead to further performance penalties. Safety checks are likely to introduce additional overhead. The IV. METHODOLOGY IN ACTION. THE virtio_net DRIVER second benchmark (step 5) should be done to assess if the Given the steps described above, the goal was to select and addition of idiomatic code and safety features is worthwhile port a Linux kernel driver from C to D. This was an iterative from a performance perspective. process with the methodology being updated with feedback C. SAFETY ENHANCEMENTS from the porting process. To select a target driver we considered the following These are some of the security enhancements provided by the criteria: D programming language. They are used to implement and build the newly implemented kernel module in D. • The driver is in the Linux kernel mainline and it is maintained, so it is relevant for the kernel community. 1) VARIABLES • The driver is easy to test and benchmark: being a are initialized to a default value of their type, removing network driver, one can easily send and receive packets initialization bugs. and measure what bandwidth is achieved. • The driver should be medium-sized (thousands of lines 2) IMPLICIT CONVERSIONS of code). This is a nice trade-off between feature of void pointers to any other pointer types are not permitted. complexity and porting effort. D requires an explicit cast for converting pointers of different Based on these criteria, we selected the virtio_net types. driver, part of the virtio framework [11]. As it’s name The C implicit switch fall-through behaviour is not suggests, it is a virtual network device driver, used as a permitted in D. D also uses the final switch statement communication channel between the guest and the hypervisor where the default case is not required nor permitted, useful in a paravirtualized environment. It satisfies the three criteria: when the default statement is useless. The final (1) it is actively maintained and used for virtualization use switch statement is especially useful when it is applied cases, (2) it can be easily tested with network tools: network on an enum type, as it will enforce the use of all the enum functionality and network metrics such as bandwidth and members in the case statements. latency can be part of a comparison evaluation process and 134506 VOLUME 10, 2022 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language (3) it has roughly 3.3k lines of code, fitting into the medium-size range we wanted. The Linux kernel version used, and the compatible driver, was 4.19.0. For development, testing and evaluation we used a virtual machine (VM) based on QEMU. V. EVALUATION To validate our approach we show that: 1) The D code has the exact same behavior as the C code that it replaces. 2) The safety mechanisms inserted successfully prevent FIGURE 2. VM to VM setup. One VM runs the iperf3 server, the other is running the client. the occurrences of memory corruption bugs. 3) The performance of the replacement software does not degrade with regards to its predecessor. From the total number of array accesses inside the We created a setup where we provide both implementations virtio_net driver, we were able to enable array bounds of the virtio_net driver (C and D) and ran similar scenarios to checking in 88.4% of the cases. The rest of 11.6% represent compare functionality, safety and performance. accesses to dynamic arrays that have been allocated outside of the ported driver. To test the effect of adding array bounds checking on the driver, we have added artificial out of bounds A. EXPERIMENTAL SETUP accesses to the code. In 60% of the cases, the C version of We created a virtual machine image with the 4.19.0 version of the driver has finished execution gracefully, whereas the D the Linux kernel. The virtual machine is run as two instances: version has stopped with a kernel panic in 100% of the cases. one running the C version of the virtio_net driver, and the other one running the D version. We refer to the virtual 2) @SAFE FUNCTIONS machines using guest and the physical system using host. To enable the D compiler to check the safety of the code, We compiled the D source files of the module using the we aimed to annotate all the functions present in the GDC compiler, version 10.3.0, with the following flags: driver with the @safe keyword. 19% of the functions have -fno-druntime -mcmodel=kernel -O2 -c. successfully compiled without any modifications, whereas For evaluation we focused on functional correctness / 81.2% have failed compilation due to performing unsafe parity, safety and performance. operations. Most of these functions rely on pointer operations and casts that are forbidden in @safe code. Additional B. FUNCTIONAL CORRECTNESS modifications are required to bring the code in a @safe state, We then run network tools in each virtual machine to check however, this can be done incrementally after the initial port for parity of functionality. For example, using ping to of the driver. validate functionality, using wget to download information from the Internet. Additionally, we check whether the 3) TEMPLATES transferred file is the correct one by comparing its MD5 hash D code may use templated functions that are instantiated with the expected one. at compile time with the right type. In case of a type mismatch, that will result in a compilation error, thus making C. SAFETY it impossible to have runtime memory corruption bugs. To enhance the safety of the ported driver code we modified By using templated functions, we replaced 56% of the total the code as to use several D language features: array bounds number of void pointer usages. The remaining 44% could not checking, @safe functions and templates. be replaced because there was no conversion pattern that we could detect and leverage for our transformation. 1) ARRAY BOUNDS CHECKING The virtio driver uses both statically and dynamically D. PERFORMANCE allocated arrays. In the case of static arrays defined inside For performance, we used the iperf3 tool that sends packets the driver, the D language compiler has sufficient information between a client and a server. We used a virtual machine at compile time to insert bounds checking code. Dynamic instance running the original C version of the virtio_net driver arrays, on the other hand, are represented in C as a pointer to and a virtual machine running the D version. Each VM was a chunk of data, therefore there isn’t sufficient information at allocated 1GB of RAM and 1 CPU. iperf3 was deployed on compile time to offer the possibility of implementing runtime both VMs. checks. However, using slices, we are able to enable bounds We devised 3 setups: checking for dynamic arrays that are defined inside the ported • vm-to-vm (in Figure 2): One VM is running the server, driver. Accesses to arrays that are dynamically allocated one VM is running the client. Both machines are of the outside the driver remain without bound checks. same type: either C and either D. VOLUME 10, 2022 134507 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language FIGURE 3. VM to host setup. The host is running the iperf3 server, the VM is running the client. FIGURE 5. Comparative TCP Performance (C vs D). FIGURE 4. VM to remote setup. Another system in the host network is running the iperf3 server, the VM is running the client. TABLE 1. Comparative performance. FIGURE 6. Comparative UDP Performance (C vs D). we consider performance similar and subject to network and measurement variation. One thing to note is the relatively reduced impact of the changes: the basic network driver functionalities are unmodified, most of the code responsible for that being shared between the two implementations. Porting other drivers may affect a larger part of the implementation and could feature a higher slowdown. This is subject for analysis in the future. • vm-to-host (in Figure 3): The host is running the server, the VM is running the client. E. REPLICABILITY • vm-to-remote (in Figure 4): Another system in the host In the interest of the validating our work, we provide network is running the server, the VM is running the it to the community on GitHub as a fork of the Linux client. kernel, an implementation of the D virtio_net driver and Each of those setups was used for 2 × 2 types of experiment scripts: https://github.com/edi33416/d-virtio. measurements: (1) the VM is running D or or the VM is The implementation of the D virtio_net is on the running C and (2) iperf3 is using TCP or it is using UDP. test_dvirtio_gdc branch, in the drivers/net/ Results are summarized in Table 1 and in Figure 5 and dfiles folder. Alongside the .d source files present in the Figure 6. drivers/net/dfiles path, there are also a Makefile Results show negligible overhead for the D module and two test.sh files. The Makefile is used to compile implementation compared to the C implementation. Given the .d source files into the .o_shipped objects that, in turn, that parts of the measurements show a negative slowdown, will be linked by the kernel build system to build the 134508 VOLUME 10, 2022 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language virtio_net.ko module. The test.sh and test2.sh kernel. Although the memory safety guarantees that Rust helper scripts are used to validate the experimental setup. offers are superior when compared to D, integrating it in They load the compiled kernel module, configure the IP the Linux kernel is a very complicated task. As evidence, address and routing table, and validate that the network is the work required to add support for Rust in the Linux working properly; this is done by downloading a file and kernel was done by 173 people (present in the commit comparing its md5sum with the reference value. changelog [21]) over the course of 18 months. This included In order to be able to compile the D driver, one needs solely the implementation of the infrastructure required to to install gdc-10, the GCC based D compiler. As we are integrate Rust code in the kernel. It does not implement using QEMU to run the VMs, one also needs to ensure that it any device driver or any parts of the Linux kernel in Rust. has installed qemu-system-x86_64 with KVM support. By comparison, our work was done by 3 people over the We have been connecting to our VMs using a serial port with course of 4 months, including the initial exploratory phase of a serial communication program, such as Minicom.2 the Linux infrastructure as well as the porting of the kernel Once all the prerequisites are met, one can build the kernel header files. The actual porting time of the device driver module, boot-up the VM and start using the compiled driver. required only 2 to 3 weeks. The reader should consider that, This process is automated in the tools/labs/ directory. in the meantime, work has been advanced to automate the The tools/labs/Makefile, through the run target, porting of kernel header files to D [26], thus reducing the will (1) compile the .o_shipped object, (2) trigger the required time to integrate D device drivers to a minimum. kernel build system that will result in the virtio_net.ko In addition, the effort to integrate Rust in the kernel has module, (3) download the YOCTO_IMAGE specified in the required compiler changes to accommodate the esoteric code tools/labs/qemu/Makefile and boot-up the VM, encountered, whereas our work does not necessitate any and (4) copy the module inside the VM. It will also setup compiler changes. IP forwarding and NAT Masquerading for the eno1 network A previous attempt to create a memory-safe version of the interface on the host machine, so one must update the C language and to use it into the Linux kernel is CCured [2]. Makefile if one’s system is using a different network interface CCured is a program transformation system that extends the name. existing type system of the C language by classifying new Once the VM has booted, one can connect to it through the pointer types according to their usage. There are three pointer serial1.pts serial pipe with the help of the minicom categories: (1) SAFE qualified pointers may be dereferenced, utility tool, as such minicom -D serial1.pts. The but cannot be cast to other types or be used as part of default login username is root and requires no password. All pointer arithmetic operations, (2) SEQ qualified pointers may the files can be found in the skels/ directory inside the VM, be used as part of pointer arithmetic, but not in type casts the kernel object being named virtio_net_tmp.ko. and (3) WILD qualified pointers that can be cast to other Precompiled .ko objects can be found in the Releases3 on pointer types. Each category is treated separately at runtime. Github: SAFE pointers simply require a null check. SEQ pointers are • Precompiled D .ko: https://github.com/edi33416/d- subjected to bounds checking, since they are typically used virtio/releases/download/dvirtio-ko/virtio_net_tmp.ko for array operations. WILD pointers are the most expensive • Precompiled C .ko: https://github.com/edi33416/d- in terms of runtime cost, because they require runtime type virtio/releases/download/cvirtio-ko/virtio_net_tmp.ko information to track the various conversion types that the It is our hope that the availability of our work will make it pointer may be subjected to. It has been previously discovered easier to evaluate, to replicate and to provide a critical eye on. that, in practice, a large percentage of the casts in C codebases between different types are either upcasts or downcasts [23]. VI. RELATED WORK This is also true for the Linux kernel where void* is used as Improving the safety of the Linux kernel and its drivers is a generic base type in order to enable polymorphism. These the constant focus of the professional and research security types of casts will be treated as WILD pointers by CCured community. There are different approaches ranging from which will be subjected to the costs of runtime checks. static analysis of the Linux kernel code [4], [9], [14] to By using D, we were able to leverage it’s metaprogramming fuzzing [3], [8], [22], [25] to the use of runtime checks and/or support in order to achieve compile-time polymorphism and instrumentation [13], [24]. type safety without adding any runtime costs. The idea of using programming languages that implement The pointers defined by CCured are fat pointers: a structure different memory safety features in order to make the Linux that packs together the raw pointer and metadata related to the kernel code safer has also been tackled. boundaries and type information. The authors acknowledge The recent availability of Rust as a programming language [15] that, because of this, multithreaded programs that rely in the Linux kernel [19], [20] paves the way for adding code on shared memory will not work with CCured. The isssue written in a secure programming language. This is compatible with shared memory programs stems from the fact that the with our own approach of using D to write code in the Linux programs not written using CCured will assume that the pointers are one word long and can be written to atomically, 2 https://wiki.emacinc.com/wiki/Getting_Started_With_Minicom when they are, in fact, a fat pointer that occupies multiple 3 https://github.com/edi33416/d-virtio/releases words in memory and requires multiple instructions in order VOLUME 10, 2022 134509 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language to perform the write, and thus the pointer could get in It is important to note that unsafety inside the kernel is a an inconsistent state. As D’s arrays are also fat pointers, fact of life. Although one can use a programming language they suffer from the same problem. We, as do the authors that uses different mechanics that increase the safety of the of CCured, believe that this problem can be resolved by code that a developer writes, at one point the developer will acquiring locks on the shared memory before accessing it. be forced to perform unsafe actions. Those can come from the Although, in theory, this solution will impact performance we need to interact with specific pins on the underlying hardware have not encountered it in practice while interfacing D with or the need to interact with the kernel API. Most of the kernel programs written in other languages. API core works with raw pointers; as such, even though the CCured was used on two Linux kernel device drivers, safe code might implement a sound object lifetime algorithm, on Linux kernel version 2.4.5, with no significant per- being forced to pass the raw pointer to the kernel will void all formance penalties. However, it has incurred performance the safety bets and assumptions. In spite of this, we believe penalties ranging from 11% to 87% on other programs, as it that there are two strong arguments that enable the use of is detailed in its paper. safe languages in practice: 1) the kernel core is extremely Another approach to use a modern programming language stable and robust as it benefits from 30 years of development for the Linux kernel drivers, in order to increase the and bug fixes, and 2) the kernel API clearly defines whose reliability of the system, was done using the Decaf drivers responsibility, the kernel’s or the driver’s, is to free allocated architecture [18]. The Decaf architecture partitions the resources. code of a driver in two separate parts: one that must Another important observation is that a programming run in the kernel-space for high performance and must language must be able to adhere to the constraints and design satisfy the OS requirements and one that can be moved to patterns implemented inside the Linux kernel. As Linus Tor- the user-space and be rewritten in another language. The valds has stated [30], kernel needs to trump any programming communication between these two parts was done through language’s needs. For this reason, we believe that the D extension procedure call (XPC). Using this architecture and programming language is a good fit given its proven ease of the Linux kernel version 2.6.18.1, five drivers were converted interoperability with C and the kernel infrastructure. to Java, gaining exception handling and automatic memory The extent to which the kernel safety can be improved management through garbage collection. The performance depends on the degree to which the module implementation is achieved was close to the one achieved by the native kernel self-sufficient. The more external functionalities the module drivers. The drawbacks of using Decaf result are traced to uses, the fewer safety enhancements can be done. the Java programming language, that has no pointers support. The performance evaluation we conducted on the vir- As such, critical paths in the code that use pointers are left in tio_net driver shows that the D version of the driver adds the unsafe part, still running in kernel space. little to no overhead to the original C variant. The safety Conversely, our methodology covers the use of the D features added are sustainable and do not introduce overhead, language for memory safety enhancements in any type of therefore, we consider the performance results encouraging. kernel modules, including those that use multithreading Given the methodology we created, we are confident other and shared memory, as is the case with CCured [2]. The drivers could be ported to D with reasonable effort. Given implementation of new components and the interfacing with the similarity to the C programming language, getting accus- other kernel components can be easily done thanks to the tomed to the D programming language will have minimal language’s high compatibility with C, compared to the more impact on the driver developer. This is in contrast to the Rust complicated syntax of Rust. The entire code of a kernel programming language whose syntax and features are very module can be rewritten in D to improve memory safety, with different from the C programming language. We believe that no need of leaving any part of the code unchanged, as is the the increasing interest of adding safe languages into the Linux case with Decaf [18]. kernel is a great step forward, as it provides kernel developers with alternatives and flexibility such that they can strike the VII. CONCLUSION right balance for their needs and goals. In this paper we presented an approach to improve the With these solutions, further drivers could be ported using security of Linux kernel modules using the D programming the methodology described in this paper. Later on, this could language. We selected virtio_net as our target driver, be extended to entire built-in components and subsystems a medium-sized and actively maintained component in the in the Linux kernel. Those would bring a much needed Linux kernel. We ported the driver in the D programming improvement in the overall security of the kernel with close- language and highlighted the functional and performance to-no overhead, with a welcoming C-similar programming parity to the original C driver and discussed the security language. benefits. We elaborated a methodology that can be used on other types of drivers for the same purpose. REFERENCES The safety features added to the driver show that the D language is able to leverage safety improvements in a [1] AspenCore. (Nov. 2019). Mobile Operating System Market Share Worldwide. Accessed: Apr. 17, 2022. [Online]. Available: https://www. kernel module, array bounds checking and compile-time embedded.com/wp-content/uploads/2019/11/EETimes polymorphism being the most important ones. _Embedded_2019_Embedded_Markets_Study.pdf 134510 VOLUME 10, 2022 C. E. Staniloiu et al.: Safer Linux Kernel Modules Using the D Programming Language [2] J. Condit, M. Harren, S. McPeak, G. C. Necula, and W. Weimer, ‘‘Cured [27] Mobile Operating System Market Share Worldwide. Accessed: in the real world,’’ ACM SIGPLAN Notices, vol. 38, no. 5, pp. 232–244, Apr. 17, 2022. [Online]. Available: https://gs.statcounter.com/os-market- 2003. share/mobile/worldwide [3] J. Corina, A. Machiry, C. Salls, Y. Shoshitaishvili, S. Hao, C. Kruegel, and [28] Project Highlight: DPP. Accessed: Apr. 17, 2022. [Online]. Available: G. Vigna, ‘‘DIFUZE: Interface aware fuzzing for kernel drivers,’’ in Proc. https://dlang.org/blog/2019/04/08/project-highlight-dpp/ ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2017, pp. 2123–2138. [29] Operating System Family/Linux | TOP50. Accessed: Apr. 17, 2022. [4] D. Dawson, N. Hawes, C. Hoermann, N. Keynes, and C. Cifuentes, [Online]. Available: https://www.top500.org/statistics/details/osfam/1/ ‘‘Finding bugs in open source kernels using parfait,’’ Sun Microsyst. [30] LKML: Linus Torvalds: Re: [Patch v9 12/27] Rust: Add Kernel Lab., Brisbane, QLD, Australia, Tech. Rep., 2009. [Online]. Available: Crate. Accessed: Oct. 29, 2022. [Online]. Available: https://lkml.org/ https://www.researchgate.net/publication/242083507_Finding_Bugs_in_ lkml/2022/9/19/1105 Open_Source_Kernels_using_Parfait [31] W3Techs. Linux vs. Windows Usage Statistics for Websites. [5] D Programming Language. Accessed: Apr. 17, 2022. [Online]. Available: Accessed: Apr. 17, 2022. [Online]. Available: https://w3techs.com/ https://dlang.org/ technologies/comparison/os-linux,os-windows [6] Programming in D for C Programmers. Accessed: Apr. 17, 2022. [Online]. Available: https://dlang.org/articles/ctod.html [7] Live Functions: Ownership and Borrowing in D. Accessed: Oct. 29, 2022. [Online]. Available: https://dlang.org/spec/ob.html CONSTANTIN EDUARD STANILOIU received [8] D. R. Jeong, K. Kim, B. Shivakumar, B. Lee, and I. Shin, ‘‘Razzer: Finding the B.Sc. and M.Sc. degrees in computer kernel race bugs through fuzzing,’’ in Proc. IEEE Symp. Secur. Privacy science and engineering from the University (SP), May 2019, pp. 754–768. POLITEHNICA of Bucharest (UPB), Bucharest, [9] R. Johnson and D. Wagner, ‘‘Finding user/kernel pointer bugs with type Romania, in 2016 and 2018, respectively, where he inference,’’ in Proc. 13th USENIX Secur. Symp. (USENIX Security). is currently pursuing the Ph.D. degree in computer San Diego, CA, USA: USENIX Association, Aug. 2004, pp. 119–134. science and information security. [10] Kernel Self-Protection. Accessed: Apr. 17, 2022. [Online]. Available: Since 2018, he has been a Teaching Assistant https://www.kernel.org/doc/html/latest/security/self-protection.html with the Department of Computer, Faculty of [11] (2022). Virtio. Accessed: Apr. 17, 2022. [Online]. Available: https://wiki.libvirt.org/page/Virtio Automatic Control and Computers, UPB. He is [12] Linux Kernel CVEs. Accessed: Apr. 17, 2022. [Online]. Available: also a member of the Secure Systems Group, Department of Computer. His https://www.linuxkernelcves.com/ research interests include programming languages, security and vulnerability [13] K. Lu, A. Pakki, and Q. Wu, ‘‘Automatically identifying security checks for detection, code and binary analysis, distributed systems, the IoT, and detecting kernel semantic bugs,’’ in Proc. Eur. Symp. Res. Comput. Secur. computer vision. Luxembourg: Springer, 2019, pp. 3–25. [14] A. Machiry, C. Spensky, J. Corina, N. Stephens, C. Kruegel, and G. Vigna, ‘‘DR.CHECKER: A soundy analysis for Linux kernel drivers,’’ in Proc. 26th USENIX Secur. Symp. (USENIX Security), 2017, pp. 1007–1024. ALEXANDRU MILITARU received the B.Sc. and [15] G. C. Necula, J. Condit, M. Harren, S. McPeak, and W. Weimer, ‘‘CCured: M.Sc. degrees in computer science and engi- Type-safe retrofitting of legacy software,’’ ACM Trans. Program. Lang. neering from the University POLITEHNICA of Syst., vol. 27, no. 3, pp. 477–526, May 2005. Bucharest (UPB). He is currently pursuing the [16] R. Nitu, E. Staniloiu, R. Deaconescu, and R. Rughinis, ‘‘Adding support M.A. degree in philosophy with the University of for reference counting in the d programming language,’’ in Proc. 17th Int. Bucharest, Bucharest, Romania. Conf. Softw. Technol., H.-G. Fill, M. van Sinderen, and L. A. Maciaszek, His research interests include programming Eds. Lisbon, Portugal: SCITEPRESS, 2022, pp. 299–306. languages and compilers. [17] A. Popov. Linux Kernel Defence Map. Accessed: Apr. 17, 2022. [Online]. Available: https://github.com/a13xp0p0v/linux-kernel-defence-map [18] M. J. Renzelmann and M. M. Swift, ‘‘Decaf: Moving device drivers to a modern language,’’ in Proc. USENIX Annu. Tech. Conf., 2009, p. 14. [Online]. Available: https://www.researchgate.net/publication/ RAZVAN NITU received the B.Sc. and M.Sc. 234787227_Decaf_moving_device_drivers_to_a_modern_language/ degrees in computer science and engineering citation/download [19] Rust in the Linux Kernel: Good Enough. Accessed: Apr. 17, 2022. [Online]. from the University POLITEHNICA of Bucharest Available: https://thenewstack.io/rust-in-the-linux-kernel-good-enough/ (UPB), Bucharest, Romania, where he is currently [20] Rust for Linux. Accessed: Apr. 17, 2022. [Online]. Available: pursuing the Ph.D. degree in programming lan- https://github.com/Rust-for-Linux guages and security. His research interests include [21] Linux Kernel Commit to Add Rust Support. Accessed: programming languages, security, computer archi- Oct. 22, 2022. [Online]. Available: https://git.kernel.org/pub/scm/linux/ tecture, and education techniques. kernel/git/torvalds/linux.git/commit/?id=8aebac82933ff1a7c8eede18cab 11e1115e2062b [22] S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. Holz, ‘‘kAFL: Hardware-assisted feedback fuzzing for OS kernels,’’ in Proc. 26th USENIX Secur. Symp. (USENIX Security), 2017, pp. 167–182. RAZVAN DEACONESCU is currently an Asso- [23] M. Siff, S. Chandra, T. Ball, K. Kunchithapadam, and T. Reps, ‘‘Coping ciate Professor at the Computer Science and Engi- with type casts in C,’’ ACM SIGSOFT Softw. Eng. Notes, vol. 24, no. 6, neering Department, University POLITEHNICA pp. 180–198, Nov. 1999. of Bucharest, Romania. His research interests [24] C. Song, B. Lee, K. Lu, W. Harris, T. Kim, and W. Lee, ‘‘Enforcing include operating systems and security, with a kernel security invariants with data flow integrity,’’ in Proc. NDSS, 2016, penchant for teaching and mentoring. If a class pp. 1–15. [25] D. Song, F. Hetzelt, J. Kim, B. B. Kang, J.-P. Seifert, and M. Franz, uses ‘‘operating systems’’ as part of its name, it’s ‘‘Agamotto: Accelerating kernel driver fuzzing with lightweight virtual likely he is part of the team. Research-wise, he is machine checkpoints,’’ in Proc. 29th USENIX Secur. Symp. (USENIX working on software security, particularly Apple Security), 2020, pp. 2541–2557. iOS security and the Unikraft unikernel in recent [26] E. Staniloiu, R. Nitu, C. Becerescu, and R. Rughinis, ‘‘Automatic years. He is a part of the open source and security community in the university integration of d code with the Linux kernel,’’ in Proc. 20th RoEduNet and in Romania. Conference: Netw. Educ. Res. (RoEduNet), Nov. 2021, pp. 1–6. VOLUME 10, 2022 134511