Authors Sebastian Oberdörfer, Andreas Schreiber, David Heidrich,
License CC-BY-4.0
Towards Generating Labeled Property Graphs for
Comprehending C#-based Software Projects
David Heidrich Andreas Schreiber Sebastian Oberdörfer
German Aerospace Center (DLR) German Aerospace Center (DLR) University of Würzburg
Weßling, Germany Cologne, Germany Würzburg, Germany
david.heidrich@dlr.de andreas.schreiber@dlr.de sebastian.oberdoerfer@uni-
wuerzburg.de
ABSTRACT features and systems, they also commonly provide native support
C# is the most widely used programming language among XR devel- for the programming language C#. As this makes C# the most
opers. However, only a limited number of graph-based data acquisi- widely used programming language among XR developers [19],
tion tools exist for C# software. XR development commonly relies even game engines without C# capabilities, like Unreal Engine,
on reusing existing software components to accelerate development. allow C# scripting through various community plugins [5, 11].
Graph-based visualization tools can facilitate this comprehension By not using a proprietary scripting language, C# game engines
process, e.g., by providing an overview of relationships between enable developers to reuse source code of existing C#-based soft-
components. This work describes a new tool called Src2Neo that ware projects. This can reduce development overhead and allows
generates labeled property graphs of C#-based software projects. for rapid prototyping. In some cases, like Godot, actually the whole
The stored graph follows a simple C# naming scheme and — con- C# source code of the game engine is open source [7]. This gives
trary to other solutions — maps each software entity to exactly one developers, researchers, and students the ability to modify all fea-
node. The resulting graph facilitates the comprehension process by tures and systems provided by the game engine. However, before
providing an easy to read representation of software components. developers can modify or reuse existing source code, they must first
Additionally, the generated graphs can act as a data basis for more build a basic understanding of the overall software system and its
advanced software visualizations without the need for complex components. Due to the abstract and complex nature of source code,
data requests. this comprehension can quickly evolve into a mentally demanding
and time-consuming activity. This is especially the case for larger
CCS CONCEPTS software projects, where even professional developers invest more
than 50% of their working time on software comprehension instead
• Human-centered computing → Visualization; • Computer
of writing or modifying source code [22].
systems organization → Real-time system architecture; • Infor-
To facilitate this comprehension process, we generate labeled
mation systems → Information retrieval.
property graphs of C#-based software projects. More specific, we
present our work-in-progress tool Src2Neo that converts a srcML
KEYWORDS
file to a graph stored in a Neo4j database. This work describes
software visualization, software comprehension, labeled property the structure of Src2Neo in detail and presents a simple graph
graph, c#, game engines database model for C# software structure. Finally, we discuss how
ACM Reference Format: our approach can facilitate software comprehension and can result
David Heidrich, Andreas Schreiber, and Sebastian Oberdörfer. 2022. Towards in more advanced software visualizations.
Generating Labeled Property Graphs for Comprehending C#-based Software
Projects. In 37th IEEE/ACM International Conference on Automated Software
Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA. ACM, New 2 RELATED WORK
York, NY, USA, 4 pages. https://doi.org/10.1145/3551349.3560513 As graphs are a fundamental data representation of software struc-
tures, graph-based visualization techniques are the most popular
1 INTRODUCTION type of software visualization [18]. Typically, graphs consist of
In the area of XR development and research, the large amount of nodes and edges. In the context of software structures, nodes gener-
technical software requirements — like low latencies [6], high frame ally represent software elements, like namespaces or classes. Edges
rates [23], and hardware compatibilities — have led to a widespread typically represent relationships between software elements, like a
adoption of 3D game engines. While these game engines, such as namespace CONTAINS a class or a method CALLS another method.
Godot, CryEngine, or Unity, provide a wide range of pre-existing Labeled property graphs are a certain type of graph that are used
to model real-world entities and their relationships to nodes and
edges [1, 16]. Here, all nodes and edges of a specific type have a
shared label. For example, all nodes that represent a namespace
This work is licensed under a Creative Commons Attribution International
4.0 License. receive a Namespace label. Additionally, nodes and edges can store
additional data in form of properties. That way, a Class node could,
ASE ’22, October 10–14, 2022, Rochester, MI, USA e.g., contain information about its lines of code or code complexity.
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9475-8/22/10. As labeled property graphs can model many aspects of a soft-
https://doi.org/10.1145/3551349.3560513 ware system, they are often proposed as a unified data source for
ASE ’22, October 10–14, 2022, Rochester, MI, USA Heidrich et al.
Figure 1: Section from a srcML file built from a C#-based mesh generator library [10].
software analysis and visualization [14, 17, 20]. However, data ac-
quisition can be difficult based on the programming language. Due
to differences between languages, data acquisition tools are gener-
ally designed for one programming language only. Hence, different
acquisition toolings exist for, e.g., C++ [3, 13] or Java [14]. Besides
the "C# Plugin" [2] for the open-source software analysis tool jQAs-
sistant [14], there are — to the best of our knowledge — currently
no tools capable of mapping C#-based software projects to a labeled
property graph.
Generating a labeled property graph requires a predefined meta
model of the graph database. The C# Plugin for jQAssistant already
introduced a meta model for C#-based software systems [2]. How-
ever, we chose against using this model as it is designed for software
analysis tasks. Hence, resulting graphs are very complex and in-
clude many meta nodes, like Member or File. Additionally, there
are no direct edges between, e.g., class and namespace nodes or Figure 2: Meta model of the Src2Neo graph database.
method and class nodes. Hence, visualization tools require complex
data queries to gain basic insights.
nodes, we use XPath expressions. For example, we use the ex-
3 LABELED PROPERTY GRAPHS pression "./def:class" to find all class nodes inside the XML file
or "./def:namespace" to get all namespaces. For each identified soft-
For our graph database meta model, we chose a simple design solely
ware component, we extract their software metrics.
based on the C# software structure (see Figure 2). It only includes
After identifying all individual software components, we look
the node labels Namespace, Class, Interface, Enum, Method, and
for relationships between them. For example, to identify Names-
Field. The model also includes the edge labels CONTAINS, IMPORTS,
pace CONTAINS Class relationships, we use the XPath expression
IMPLEMENTS, INHERITS_FROM, and CALLS. Additionally, nodes
"./../../../def:namespace" on each class node in the XML. If a parent
contain properties, like "Name", "File Path", and "Lines of Code".
namespace is found, we store the relationship. Other examples are
However, additional properties can easy be added to nodes.
the Class IMPORTS Namespace relationships, where we look for the
<using> tag with the XPath expression "./../def:using" on each class,
3.1 Src2Neo or the Method CALLS Method relationship, where we look for the
We use XML files generated with the open source tool srcML [4]. <call> tag inside a method and resolve the called name.
Among other languages, srcML parses C# source code to a XML file. Finally, we write the data to a Neo4j graph database (see Figure 3).
This file contains all original information, including file structure, During the writing process, we go through all identified software
white spaces and comments. Inside the XML file, all syntax elements components and store each component as an individual graph node.
receive individual XML-tags (see Figure 1). For example, a <unit> Then we add all software component relationships to the graph.
tag represents a file and <namespace>, <class>, and <function> tags All data is added via Cypher queries which are sent to the Neo4j
represent their C# counterparts. As all software components can server using the .Net Neo4j driver [8].
easily be addressed, srcML is commonly used in software metrics
extraction [15]. 4 DISCUSSION
Our tool Src2Neo then converts a given srcML-generated XML The generated nodes are a 1:1 representation of a C#-based soft-
file to a labeled property graph. First, it identifies all software com- ware project. They do not include any meta nodes that require
ponents (i.e., namespaces, classes, interfaces, enums, methods, and complicated data queries that can slow down the comprehension
fields). To navigate inside the XML file and to find specific XML process. Hence, developers, researchers, and student can explore
Towards Generating Labeled Property Graphs for Comprehending C#-based Software Projects ASE ’22, October 10–14, 2022, Rochester, MI, USA
Figure 3: Different views of a generated labeled property graph visualized with the Neo4j Browser. Namespace nodes are blue,
class nodes are orange, interface nodes are light-brown, enum nodes are green, method nodes are red, and field nodes are pink.
the software components without the need for specialized graph engine projects, like Godot or Unity projects, including meta files
visualization tools. However, the relationship placement can be and resource files. Finally, as srcML is already capable of tagging
more challenging. To reduce complexity, our database meta model C++ and Java projects, we want to add support for these program-
does not include all possible relationships. For example, we only use ming languages in the future, too. Src2Neo is currently developed
the IMPORTS relationship between classes and namespaces (as it is in C# and available on GitHub [9].
the case in the source code). But in some use cases, a representation
with IMPORTS relationships between namespaces might be the
more suitable solution. Hence, our graphs will not replace advanced REFERENCES
software visualization tools with custom data queries. [1] Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher,
Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan
Advanced software visualization tools could, however, benefit Sequeda, et al. 2018. G-CORE: A core for future graph query languages. In
from using our generated graphs as a data source. On one hand, Proceedings of the 2018 International Conference on Management of Data. 1421–
the easy graph layout could facilitate rapid software visualization 1432.
[2] Stefan Bechert and Richard Müller. 2020. jQAssistant C# Plugin. https://github.
prototyping (especially for unskilled software developers). On the com/softvis-research/jqa-csharp-plugin
other hand, metaphor-based software visualization tools, like Island- [3] Andrea Biaggi, Francesca Arcelli Fontana, and Riccardo Roveda. 2018. An Archi-
tectural Smells Detection Tool for C and C++ Projects. In 2018 44th Euromicro
Viz [12] or Code City [21], could map their metaphors to specific Conference on Software Engineering and Advanced Applications (SEAA). 417–420.
nodes and edges inside the graph database. This could reduce the https://doi.org/10.1109/SEAA.2018.00074
complexity of such big open-source software visualization tools [4] Michael L Collard, Michael J Decker, and Jonathan I Maletic. 2011. Lightweight
transformation and fact extraction with the srcML toolkit. In 2011 IEEE 11th
and make them more accessible. international working conference on source code analysis and manipulation. IEEE,
173–184.
[5] Stanislav Denisov, Josh Olson, Stan Prokop, Sean Devonport, and Victor Müller.
5 FUTURE WORK 2022. UnrealCLR. https://github.com/nxrighthere/UnrealCLR
[6] Mohammed S Elbamby, Cristina Perfecto, Mehdi Bennis, and Klaus Doppler. 2018.
We plan to extend our meta model with more relationships, e.g., Toward low-latency and ultra-reliable virtual reality. IEEE Network 32, 2 (2018),
78–84.
a TYPE_OF relationship between a field and a class. At the same [7] Github. 2022. Godot Engine. https://github.com/godotengine/godot
time, we want users to be able to choose the level of detail of [8] Github. 2022. Neo4j .NET Driver. https://github.com/neo4j/neo4j-dotnet-driver
the generated graph. For example, users might want to combine [9] Github. 2022. Src2Neo. https://github.com/DLR-SC/src2neo
[10] Github. 2022. Triangle.NET. https://github.com/wo80/Triangle.NET
methods and fields into a single Class Member node or do not want [11] Mikayla Hutchinson. 2019. MonoUE. https://mono-ue.github.io/
to include certain relationships, like Method CALLS Method, to keep [12] Martin Misiak, Andreas Schreiber, Arnulph Fuhrmann, Sascha Zur, Doreen Seider,
the graph database more simple. and Lisa Nafeie. 2018. IslandViz: A tool for visualizing modular software systems
in virtual reality. In 2018 IEEE Working Conference on Software Visualization
We are currently identifying possible user requirements follow- (VISSOFT). IEEE, 112–116.
ing an user-centered design process. After this design process is [13] Johann Mortara, Philippe Collet, and Xhevahire Tërnava. 2020. Identifying and
Mapping Implemented Variabilities in Java and C++ Systems using symfinder.
finished, we plan to evaluate and quantify the benefits of Src2Neo In Proceedings of the 24th ACM International Systems and Software Product Line
and its generated labeled property graphs. Conference-Volume B. 9–12.
We also plan to support more C# components, like structs, records, [14] Richard Müller, Dirk Mahler, Michael Hunger, Jens Nerche, and Markus Harrer.
2018. Towards an open source stack to create a unified data source for soft-
generic classes, partial classes, or anonymous types. Additionally, ware analysis and visualization. In 2018 IEEE Working Conference on Software
we are thinking about adding specific support for C#-based game Visualization (VISSOFT). IEEE, 107–111.
ASE ’22, October 10–14, 2022, Rochester, MI, USA Heidrich et al.
[15] Roy Oberhauser. 2020. A Machine Learning Approach Towards Automatic [19] SlashData. 2021. State of the Developer Nation 20th Edition.
Software Design Pattern Recognition Across Multiple Programming Languages [20] Lynn von Kurnatowski, David Heidrich, Nalin Güden, Andreas Schreiber, Hendrik
(Proceedings of the Fifteenth International Conference on Software Engineering Polzin, and Christian Stangl. 2021. Analysing and Visualizing large Aerospace
Advances). IARIA, 27 – 32. https://nbn-resolving.org/urn:nbn:de:bsz:944-opus4- Software Systems. In ASCEND 2021. 4082.
10255 [21] Richard Wettel and Michele Lanza. 2008. Codecity: 3d visualization of large-
[16] Ian Robinson, Jim Webber, and Emil Eifrem. 2015. Graph databases: new opportu- scale software. In Companion of the 30th international conference on Software
nities for connected data. " O’Reilly Media, Inc.". engineering. 921–922.
[17] Aashik Sadar and Vinitha Panicker. 2015. DocTool-a tool for visualizing soft- [22] Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shan-
ware projects using graph database. In 2015 Eighth International Conference on ping Li. 2017. Measuring program comprehension: A large-scale field study with
Contemporary Computing (IC3). IEEE, 439–442. professionals. IEEE Transactions on Software Engineering 44, 10 (2017), 951–976.
[18] Mojtaba Shahin, Peng Liang, and Muhammad Ali Babar. 2014. A systematic [23] Jianghao Xiong, En-Lin Hsiang, Ziqian He, Tao Zhan, and Shin-Tson Wu. 2021.
review of software architecture visualization techniques. Journal of Systems and Augmented reality and virtual reality displays: emerging technologies and future
Software 94 (2014), 161–185. perspectives. Light: Science & Applications 10, 1 (2021), 1–30.