DOKK Library

Towards Generating Labeled Property Graphs for Comprehending C#-based Software Projects

Authors Andreas Schreiber David Heidrich Sebastian Oberdörfer

License CC-BY-4.0

Plaintext
                    Towards Generating Labeled Property Graphs for
                      Comprehending C#-based Software Projects
                  David Heidrich                                      Andreas Schreiber                     Sebastian Oberdörfer
       German Aerospace Center (DLR)                          German Aerospace Center (DLR)                  University of Würzburg
            Weßling, Germany                                        Cologne, Germany                          Würzburg, Germany
           david.heidrich@dlr.de                                 andreas.schreiber@dlr.de                  sebastian.oberdoerfer@uni-
                                                                                                                  wuerzburg.de

ABSTRACT                                                                        features and systems, they also commonly provide native support
C# is the most widely used programming language among XR devel-                 for the programming language C#. As this makes C# the most
opers. However, only a limited number of graph-based data acquisi-              widely used programming language among XR developers [19],
tion tools exist for C# software. XR development commonly relies                even game engines without C# capabilities, like Unreal Engine,
on reusing existing software components to accelerate development.              allow C# scripting through various community plugins [5, 11].
Graph-based visualization tools can facilitate this comprehension                   By not using a proprietary scripting language, C# game engines
process, e.g., by providing an overview of relationships between                enable developers to reuse source code of existing C#-based soft-
components. This work describes a new tool called Src2Neo that                  ware projects. This can reduce development overhead and allows
generates labeled property graphs of C#-based software projects.                for rapid prototyping. In some cases, like Godot, actually the whole
The stored graph follows a simple C# naming scheme and — con-                   C# source code of the game engine is open source [7]. This gives
trary to other solutions — maps each software entity to exactly one             developers, researchers, and students the ability to modify all fea-
node. The resulting graph facilitates the comprehension process by              tures and systems provided by the game engine. However, before
providing an easy to read representation of software components.                developers can modify or reuse existing source code, they must first
Additionally, the generated graphs can act as a data basis for more             build a basic understanding of the overall software system and its
advanced software visualizations without the need for complex                   components. Due to the abstract and complex nature of source code,
data requests.                                                                  this comprehension can quickly evolve into a mentally demanding
                                                                                and time-consuming activity. This is especially the case for larger
CCS CONCEPTS                                                                    software projects, where even professional developers invest more
                                                                                than 50% of their working time on software comprehension instead
• Human-centered computing → Visualization; • Computer
                                                                                of writing or modifying source code [22].
systems organization → Real-time system architecture; • Infor-
                                                                                    To facilitate this comprehension process, we generate labeled
mation systems → Information retrieval.
                                                                                property graphs of C#-based software projects. More specific, we
                                                                                present our work-in-progress tool Src2Neo that converts a srcML
KEYWORDS
                                                                                file to a graph stored in a Neo4j database. This work describes
software visualization, software comprehension, labeled property                the structure of Src2Neo in detail and presents a simple graph
graph, c#, game engines                                                         database model for C# software structure. Finally, we discuss how
ACM Reference Format:                                                           our approach can facilitate software comprehension and can result
David Heidrich, Andreas Schreiber, and Sebastian Oberdörfer. 2022. Towards      in more advanced software visualizations.
Generating Labeled Property Graphs for Comprehending C#-based Software
Projects. In 37th IEEE/ACM International Conference on Automated Software
Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA. ACM, New        2   RELATED WORK
York, NY, USA, 4 pages. https://doi.org/10.1145/3551349.3560513                 As graphs are a fundamental data representation of software struc-
                                                                                tures, graph-based visualization techniques are the most popular
1     INTRODUCTION                                                              type of software visualization [18]. Typically, graphs consist of
In the area of XR development and research, the large amount of                 nodes and edges. In the context of software structures, nodes gener-
technical software requirements — like low latencies [6], high frame            ally represent software elements, like namespaces or classes. Edges
rates [23], and hardware compatibilities — have led to a widespread             typically represent relationships between software elements, like a
adoption of 3D game engines. While these game engines, such as                  namespace CONTAINS a class or a method CALLS another method.
Godot, CryEngine, or Unity, provide a wide range of pre-existing                Labeled property graphs are a certain type of graph that are used
                                                                                to model real-world entities and their relationships to nodes and
                                                                                edges [1, 16]. Here, all nodes and edges of a specific type have a
                                                                                shared label. For example, all nodes that represent a namespace
This work is licensed under a Creative Commons Attribution International
4.0 License.                                                                    receive a Namespace label. Additionally, nodes and edges can store
                                                                                additional data in form of properties. That way, a Class node could,
ASE ’22, October 10–14, 2022, Rochester, MI, USA                                e.g., contain information about its lines of code or code complexity.
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9475-8/22/10.                                                  As labeled property graphs can model many aspects of a soft-
https://doi.org/10.1145/3551349.3560513                                         ware system, they are often proposed as a unified data source for
ASE ’22, October 10–14, 2022, Rochester, MI, USA                                                                                      Heidrich et al.




                          Figure 1: Section from a srcML file built from a C#-based mesh generator library [10].


software analysis and visualization [14, 17, 20]. However, data ac-
quisition can be difficult based on the programming language. Due
to differences between languages, data acquisition tools are gener-
ally designed for one programming language only. Hence, different
acquisition toolings exist for, e.g., C++ [3, 13] or Java [14]. Besides
the "C# Plugin" [2] for the open-source software analysis tool jQAs-
sistant [14], there are — to the best of our knowledge — currently
no tools capable of mapping C#-based software projects to a labeled
property graph.
   Generating a labeled property graph requires a predefined meta
model of the graph database. The C# Plugin for jQAssistant already
introduced a meta model for C#-based software systems [2]. How-
ever, we chose against using this model as it is designed for software
analysis tasks. Hence, resulting graphs are very complex and in-
clude many meta nodes, like Member or File. Additionally, there
are no direct edges between, e.g., class and namespace nodes or                Figure 2: Meta model of the Src2Neo graph database.
method and class nodes. Hence, visualization tools require complex
data queries to gain basic insights.
                                                                           nodes, we use XPath expressions. For example, we use the ex-
3     LABELED PROPERTY GRAPHS                                              pression "./def:class" to find all class nodes inside the XML file
                                                                           or "./def:namespace" to get all namespaces. For each identified soft-
For our graph database meta model, we chose a simple design solely
                                                                           ware component, we extract their software metrics.
based on the C# software structure (see Figure 2). It only includes
                                                                               After identifying all individual software components, we look
the node labels Namespace, Class, Interface, Enum, Method, and
                                                                           for relationships between them. For example, to identify Names-
Field. The model also includes the edge labels CONTAINS, IMPORTS,
                                                                           pace CONTAINS Class relationships, we use the XPath expression
IMPLEMENTS, INHERITS_FROM, and CALLS. Additionally, nodes
                                                                           "./../../../def:namespace" on each class node in the XML. If a parent
contain properties, like "Name", "File Path", and "Lines of Code".
                                                                           namespace is found, we store the relationship. Other examples are
However, additional properties can easy be added to nodes.
                                                                           the Class IMPORTS Namespace relationships, where we look for the
                                                                           <using> tag with the XPath expression "./../def:using" on each class,
3.1     Src2Neo                                                            or the Method CALLS Method relationship, where we look for the
We use XML files generated with the open source tool srcML [4].            <call> tag inside a method and resolve the called name.
Among other languages, srcML parses C# source code to a XML file.              Finally, we write the data to a Neo4j graph database (see Figure 3).
This file contains all original information, including file structure,     During the writing process, we go through all identified software
white spaces and comments. Inside the XML file, all syntax elements        components and store each component as an individual graph node.
receive individual XML-tags (see Figure 1). For example, a <unit>          Then we add all software component relationships to the graph.
tag represents a file and <namespace>, <class>, and <function> tags        All data is added via Cypher queries which are sent to the Neo4j
represent their C# counterparts. As all software components can            server using the .Net Neo4j driver [8].
easily be addressed, srcML is commonly used in software metrics
extraction [15].                                                           4    DISCUSSION
    Our tool Src2Neo then converts a given srcML-generated XML             The generated nodes are a 1:1 representation of a C#-based soft-
file to a labeled property graph. First, it identifies all software com-   ware project. They do not include any meta nodes that require
ponents (i.e., namespaces, classes, interfaces, enums, methods, and        complicated data queries that can slow down the comprehension
fields). To navigate inside the XML file and to find specific XML          process. Hence, developers, researchers, and student can explore
Towards Generating Labeled Property Graphs for Comprehending C#-based Software Projects                                 ASE ’22, October 10–14, 2022, Rochester, MI, USA




Figure 3: Different views of a generated labeled property graph visualized with the Neo4j Browser. Namespace nodes are blue,
class nodes are orange, interface nodes are light-brown, enum nodes are green, method nodes are red, and field nodes are pink.


the software components without the need for specialized graph                       engine projects, like Godot or Unity projects, including meta files
visualization tools. However, the relationship placement can be                      and resource files. Finally, as srcML is already capable of tagging
more challenging. To reduce complexity, our database meta model                      C++ and Java projects, we want to add support for these program-
does not include all possible relationships. For example, we only use                ming languages in the future, too. Src2Neo is currently developed
the IMPORTS relationship between classes and namespaces (as it is                    in C# and available on GitHub [9].
the case in the source code). But in some use cases, a representation
with IMPORTS relationships between namespaces might be the
more suitable solution. Hence, our graphs will not replace advanced                  REFERENCES
software visualization tools with custom data queries.                                [1] Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher,
                                                                                          Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan
   Advanced software visualization tools could, however, benefit                          Sequeda, et al. 2018. G-CORE: A core for future graph query languages. In
from using our generated graphs as a data source. On one hand,                            Proceedings of the 2018 International Conference on Management of Data. 1421–
the easy graph layout could facilitate rapid software visualization                       1432.
                                                                                      [2] Stefan Bechert and Richard Müller. 2020. jQAssistant C# Plugin. https://github.
prototyping (especially for unskilled software developers). On the                        com/softvis-research/jqa-csharp-plugin
other hand, metaphor-based software visualization tools, like Island-                 [3] Andrea Biaggi, Francesca Arcelli Fontana, and Riccardo Roveda. 2018. An Archi-
                                                                                          tectural Smells Detection Tool for C and C++ Projects. In 2018 44th Euromicro
Viz [12] or Code City [21], could map their metaphors to specific                         Conference on Software Engineering and Advanced Applications (SEAA). 417–420.
nodes and edges inside the graph database. This could reduce the                          https://doi.org/10.1109/SEAA.2018.00074
complexity of such big open-source software visualization tools                       [4] Michael L Collard, Michael J Decker, and Jonathan I Maletic. 2011. Lightweight
                                                                                          transformation and fact extraction with the srcML toolkit. In 2011 IEEE 11th
and make them more accessible.                                                            international working conference on source code analysis and manipulation. IEEE,
                                                                                          173–184.
                                                                                      [5] Stanislav Denisov, Josh Olson, Stan Prokop, Sean Devonport, and Victor Müller.
5    FUTURE WORK                                                                          2022. UnrealCLR. https://github.com/nxrighthere/UnrealCLR
                                                                                      [6] Mohammed S Elbamby, Cristina Perfecto, Mehdi Bennis, and Klaus Doppler. 2018.
We plan to extend our meta model with more relationships, e.g.,                           Toward low-latency and ultra-reliable virtual reality. IEEE Network 32, 2 (2018),
                                                                                          78–84.
a TYPE_OF relationship between a field and a class. At the same                       [7] Github. 2022. Godot Engine. https://github.com/godotengine/godot
time, we want users to be able to choose the level of detail of                       [8] Github. 2022. Neo4j .NET Driver. https://github.com/neo4j/neo4j-dotnet-driver
the generated graph. For example, users might want to combine                         [9] Github. 2022. Src2Neo. https://github.com/DLR-SC/src2neo
                                                                                     [10] Github. 2022. Triangle.NET. https://github.com/wo80/Triangle.NET
methods and fields into a single Class Member node or do not want                    [11] Mikayla Hutchinson. 2019. MonoUE. https://mono-ue.github.io/
to include certain relationships, like Method CALLS Method, to keep                  [12] Martin Misiak, Andreas Schreiber, Arnulph Fuhrmann, Sascha Zur, Doreen Seider,
the graph database more simple.                                                           and Lisa Nafeie. 2018. IslandViz: A tool for visualizing modular software systems
                                                                                          in virtual reality. In 2018 IEEE Working Conference on Software Visualization
   We are currently identifying possible user requirements follow-                        (VISSOFT). IEEE, 112–116.
ing an user-centered design process. After this design process is                    [13] Johann Mortara, Philippe Collet, and Xhevahire Tërnava. 2020. Identifying and
                                                                                          Mapping Implemented Variabilities in Java and C++ Systems using symfinder.
finished, we plan to evaluate and quantify the benefits of Src2Neo                        In Proceedings of the 24th ACM International Systems and Software Product Line
and its generated labeled property graphs.                                                Conference-Volume B. 9–12.
   We also plan to support more C# components, like structs, records,                [14] Richard Müller, Dirk Mahler, Michael Hunger, Jens Nerche, and Markus Harrer.
                                                                                          2018. Towards an open source stack to create a unified data source for soft-
generic classes, partial classes, or anonymous types. Additionally,                       ware analysis and visualization. In 2018 IEEE Working Conference on Software
we are thinking about adding specific support for C#-based game                           Visualization (VISSOFT). IEEE, 107–111.
ASE ’22, October 10–14, 2022, Rochester, MI, USA                                                                                                              Heidrich et al.


[15] Roy Oberhauser. 2020. A Machine Learning Approach Towards Automatic                [19] SlashData. 2021. State of the Developer Nation 20th Edition.
     Software Design Pattern Recognition Across Multiple Programming Languages          [20] Lynn von Kurnatowski, David Heidrich, Nalin Güden, Andreas Schreiber, Hendrik
     (Proceedings of the Fifteenth International Conference on Software Engineering          Polzin, and Christian Stangl. 2021. Analysing and Visualizing large Aerospace
     Advances). IARIA, 27 – 32. https://nbn-resolving.org/urn:nbn:de:bsz:944-opus4-          Software Systems. In ASCEND 2021. 4082.
     10255                                                                              [21] Richard Wettel and Michele Lanza. 2008. Codecity: 3d visualization of large-
[16] Ian Robinson, Jim Webber, and Emil Eifrem. 2015. Graph databases: new opportu-          scale software. In Companion of the 30th international conference on Software
     nities for connected data. " O’Reilly Media, Inc.".                                     engineering. 921–922.
[17] Aashik Sadar and Vinitha Panicker. 2015. DocTool-a tool for visualizing soft-      [22] Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shan-
     ware projects using graph database. In 2015 Eighth International Conference on          ping Li. 2017. Measuring program comprehension: A large-scale field study with
     Contemporary Computing (IC3). IEEE, 439–442.                                            professionals. IEEE Transactions on Software Engineering 44, 10 (2017), 951–976.
[18] Mojtaba Shahin, Peng Liang, and Muhammad Ali Babar. 2014. A systematic             [23] Jianghao Xiong, En-Lin Hsiang, Ziqian He, Tao Zhan, and Shin-Tson Wu. 2021.
     review of software architecture visualization techniques. Journal of Systems and        Augmented reality and virtual reality displays: emerging technologies and future
     Software 94 (2014), 161–185.                                                            perspectives. Light: Science & Applications 10, 1 (2021), 1–30.