Authors Andreas Schreiber David Heidrich Sebastian Oberdörfer
License CC-BY-4.0
Towards Generating Labeled Property Graphs for Comprehending C#-based Software Projects David Heidrich Andreas Schreiber Sebastian Oberdörfer German Aerospace Center (DLR) German Aerospace Center (DLR) University of Würzburg Weßling, Germany Cologne, Germany Würzburg, Germany david.heidrich@dlr.de andreas.schreiber@dlr.de sebastian.oberdoerfer@uni- wuerzburg.de ABSTRACT features and systems, they also commonly provide native support C# is the most widely used programming language among XR devel- for the programming language C#. As this makes C# the most opers. However, only a limited number of graph-based data acquisi- widely used programming language among XR developers [19], tion tools exist for C# software. XR development commonly relies even game engines without C# capabilities, like Unreal Engine, on reusing existing software components to accelerate development. allow C# scripting through various community plugins [5, 11]. Graph-based visualization tools can facilitate this comprehension By not using a proprietary scripting language, C# game engines process, e.g., by providing an overview of relationships between enable developers to reuse source code of existing C#-based soft- components. This work describes a new tool called Src2Neo that ware projects. This can reduce development overhead and allows generates labeled property graphs of C#-based software projects. for rapid prototyping. In some cases, like Godot, actually the whole The stored graph follows a simple C# naming scheme and — con- C# source code of the game engine is open source [7]. This gives trary to other solutions — maps each software entity to exactly one developers, researchers, and students the ability to modify all fea- node. The resulting graph facilitates the comprehension process by tures and systems provided by the game engine. However, before providing an easy to read representation of software components. developers can modify or reuse existing source code, they must first Additionally, the generated graphs can act as a data basis for more build a basic understanding of the overall software system and its advanced software visualizations without the need for complex components. Due to the abstract and complex nature of source code, data requests. this comprehension can quickly evolve into a mentally demanding and time-consuming activity. This is especially the case for larger CCS CONCEPTS software projects, where even professional developers invest more than 50% of their working time on software comprehension instead • Human-centered computing → Visualization; • Computer of writing or modifying source code [22]. systems organization → Real-time system architecture; • Infor- To facilitate this comprehension process, we generate labeled mation systems → Information retrieval. property graphs of C#-based software projects. More specific, we present our work-in-progress tool Src2Neo that converts a srcML KEYWORDS file to a graph stored in a Neo4j database. This work describes software visualization, software comprehension, labeled property the structure of Src2Neo in detail and presents a simple graph graph, c#, game engines database model for C# software structure. Finally, we discuss how ACM Reference Format: our approach can facilitate software comprehension and can result David Heidrich, Andreas Schreiber, and Sebastian Oberdörfer. 2022. Towards in more advanced software visualizations. Generating Labeled Property Graphs for Comprehending C#-based Software Projects. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA. ACM, New 2 RELATED WORK York, NY, USA, 4 pages. https://doi.org/10.1145/3551349.3560513 As graphs are a fundamental data representation of software struc- tures, graph-based visualization techniques are the most popular 1 INTRODUCTION type of software visualization [18]. Typically, graphs consist of In the area of XR development and research, the large amount of nodes and edges. In the context of software structures, nodes gener- technical software requirements — like low latencies [6], high frame ally represent software elements, like namespaces or classes. Edges rates [23], and hardware compatibilities — have led to a widespread typically represent relationships between software elements, like a adoption of 3D game engines. While these game engines, such as namespace CONTAINS a class or a method CALLS another method. Godot, CryEngine, or Unity, provide a wide range of pre-existing Labeled property graphs are a certain type of graph that are used to model real-world entities and their relationships to nodes and edges [1, 16]. Here, all nodes and edges of a specific type have a shared label. For example, all nodes that represent a namespace This work is licensed under a Creative Commons Attribution International 4.0 License. receive a Namespace label. Additionally, nodes and edges can store additional data in form of properties. That way, a Class node could, ASE ’22, October 10–14, 2022, Rochester, MI, USA e.g., contain information about its lines of code or code complexity. © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9475-8/22/10. As labeled property graphs can model many aspects of a soft- https://doi.org/10.1145/3551349.3560513 ware system, they are often proposed as a unified data source for ASE ’22, October 10–14, 2022, Rochester, MI, USA Heidrich et al. Figure 1: Section from a srcML file built from a C#-based mesh generator library [10]. software analysis and visualization [14, 17, 20]. However, data ac- quisition can be difficult based on the programming language. Due to differences between languages, data acquisition tools are gener- ally designed for one programming language only. Hence, different acquisition toolings exist for, e.g., C++ [3, 13] or Java [14]. Besides the "C# Plugin" [2] for the open-source software analysis tool jQAs- sistant [14], there are — to the best of our knowledge — currently no tools capable of mapping C#-based software projects to a labeled property graph. Generating a labeled property graph requires a predefined meta model of the graph database. The C# Plugin for jQAssistant already introduced a meta model for C#-based software systems [2]. How- ever, we chose against using this model as it is designed for software analysis tasks. Hence, resulting graphs are very complex and in- clude many meta nodes, like Member or File. Additionally, there are no direct edges between, e.g., class and namespace nodes or Figure 2: Meta model of the Src2Neo graph database. method and class nodes. Hence, visualization tools require complex data queries to gain basic insights. nodes, we use XPath expressions. For example, we use the ex- 3 LABELED PROPERTY GRAPHS pression "./def:class" to find all class nodes inside the XML file or "./def:namespace" to get all namespaces. For each identified soft- For our graph database meta model, we chose a simple design solely ware component, we extract their software metrics. based on the C# software structure (see Figure 2). It only includes After identifying all individual software components, we look the node labels Namespace, Class, Interface, Enum, Method, and for relationships between them. For example, to identify Names- Field. The model also includes the edge labels CONTAINS, IMPORTS, pace CONTAINS Class relationships, we use the XPath expression IMPLEMENTS, INHERITS_FROM, and CALLS. Additionally, nodes "./../../../def:namespace" on each class node in the XML. If a parent contain properties, like "Name", "File Path", and "Lines of Code". namespace is found, we store the relationship. Other examples are However, additional properties can easy be added to nodes. the Class IMPORTS Namespace relationships, where we look for the <using> tag with the XPath expression "./../def:using" on each class, 3.1 Src2Neo or the Method CALLS Method relationship, where we look for the We use XML files generated with the open source tool srcML [4]. <call> tag inside a method and resolve the called name. Among other languages, srcML parses C# source code to a XML file. Finally, we write the data to a Neo4j graph database (see Figure 3). This file contains all original information, including file structure, During the writing process, we go through all identified software white spaces and comments. Inside the XML file, all syntax elements components and store each component as an individual graph node. receive individual XML-tags (see Figure 1). For example, a <unit> Then we add all software component relationships to the graph. tag represents a file and <namespace>, <class>, and <function> tags All data is added via Cypher queries which are sent to the Neo4j represent their C# counterparts. As all software components can server using the .Net Neo4j driver [8]. easily be addressed, srcML is commonly used in software metrics extraction [15]. 4 DISCUSSION Our tool Src2Neo then converts a given srcML-generated XML The generated nodes are a 1:1 representation of a C#-based soft- file to a labeled property graph. First, it identifies all software com- ware project. They do not include any meta nodes that require ponents (i.e., namespaces, classes, interfaces, enums, methods, and complicated data queries that can slow down the comprehension fields). To navigate inside the XML file and to find specific XML process. Hence, developers, researchers, and student can explore Towards Generating Labeled Property Graphs for Comprehending C#-based Software Projects ASE ’22, October 10–14, 2022, Rochester, MI, USA Figure 3: Different views of a generated labeled property graph visualized with the Neo4j Browser. Namespace nodes are blue, class nodes are orange, interface nodes are light-brown, enum nodes are green, method nodes are red, and field nodes are pink. the software components without the need for specialized graph engine projects, like Godot or Unity projects, including meta files visualization tools. However, the relationship placement can be and resource files. Finally, as srcML is already capable of tagging more challenging. To reduce complexity, our database meta model C++ and Java projects, we want to add support for these program- does not include all possible relationships. For example, we only use ming languages in the future, too. Src2Neo is currently developed the IMPORTS relationship between classes and namespaces (as it is in C# and available on GitHub [9]. the case in the source code). But in some use cases, a representation with IMPORTS relationships between namespaces might be the more suitable solution. Hence, our graphs will not replace advanced REFERENCES software visualization tools with custom data queries. [1] Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Advanced software visualization tools could, however, benefit Sequeda, et al. 2018. G-CORE: A core for future graph query languages. In from using our generated graphs as a data source. On one hand, Proceedings of the 2018 International Conference on Management of Data. 1421– the easy graph layout could facilitate rapid software visualization 1432. [2] Stefan Bechert and Richard Müller. 2020. jQAssistant C# Plugin. https://github. prototyping (especially for unskilled software developers). On the com/softvis-research/jqa-csharp-plugin other hand, metaphor-based software visualization tools, like Island- [3] Andrea Biaggi, Francesca Arcelli Fontana, and Riccardo Roveda. 2018. An Archi- tectural Smells Detection Tool for C and C++ Projects. In 2018 44th Euromicro Viz [12] or Code City [21], could map their metaphors to specific Conference on Software Engineering and Advanced Applications (SEAA). 417–420. nodes and edges inside the graph database. This could reduce the https://doi.org/10.1109/SEAA.2018.00074 complexity of such big open-source software visualization tools [4] Michael L Collard, Michael J Decker, and Jonathan I Maletic. 2011. Lightweight transformation and fact extraction with the srcML toolkit. In 2011 IEEE 11th and make them more accessible. international working conference on source code analysis and manipulation. IEEE, 173–184. [5] Stanislav Denisov, Josh Olson, Stan Prokop, Sean Devonport, and Victor Müller. 5 FUTURE WORK 2022. UnrealCLR. https://github.com/nxrighthere/UnrealCLR [6] Mohammed S Elbamby, Cristina Perfecto, Mehdi Bennis, and Klaus Doppler. 2018. We plan to extend our meta model with more relationships, e.g., Toward low-latency and ultra-reliable virtual reality. IEEE Network 32, 2 (2018), 78–84. a TYPE_OF relationship between a field and a class. At the same [7] Github. 2022. Godot Engine. https://github.com/godotengine/godot time, we want users to be able to choose the level of detail of [8] Github. 2022. Neo4j .NET Driver. https://github.com/neo4j/neo4j-dotnet-driver the generated graph. For example, users might want to combine [9] Github. 2022. Src2Neo. https://github.com/DLR-SC/src2neo [10] Github. 2022. Triangle.NET. https://github.com/wo80/Triangle.NET methods and fields into a single Class Member node or do not want [11] Mikayla Hutchinson. 2019. MonoUE. https://mono-ue.github.io/ to include certain relationships, like Method CALLS Method, to keep [12] Martin Misiak, Andreas Schreiber, Arnulph Fuhrmann, Sascha Zur, Doreen Seider, the graph database more simple. and Lisa Nafeie. 2018. IslandViz: A tool for visualizing modular software systems in virtual reality. In 2018 IEEE Working Conference on Software Visualization We are currently identifying possible user requirements follow- (VISSOFT). IEEE, 112–116. ing an user-centered design process. After this design process is [13] Johann Mortara, Philippe Collet, and Xhevahire Tërnava. 2020. Identifying and Mapping Implemented Variabilities in Java and C++ Systems using symfinder. finished, we plan to evaluate and quantify the benefits of Src2Neo In Proceedings of the 24th ACM International Systems and Software Product Line and its generated labeled property graphs. Conference-Volume B. 9–12. We also plan to support more C# components, like structs, records, [14] Richard Müller, Dirk Mahler, Michael Hunger, Jens Nerche, and Markus Harrer. 2018. Towards an open source stack to create a unified data source for soft- generic classes, partial classes, or anonymous types. Additionally, ware analysis and visualization. In 2018 IEEE Working Conference on Software we are thinking about adding specific support for C#-based game Visualization (VISSOFT). IEEE, 107–111. ASE ’22, October 10–14, 2022, Rochester, MI, USA Heidrich et al. [15] Roy Oberhauser. 2020. A Machine Learning Approach Towards Automatic [19] SlashData. 2021. State of the Developer Nation 20th Edition. Software Design Pattern Recognition Across Multiple Programming Languages [20] Lynn von Kurnatowski, David Heidrich, Nalin Güden, Andreas Schreiber, Hendrik (Proceedings of the Fifteenth International Conference on Software Engineering Polzin, and Christian Stangl. 2021. Analysing and Visualizing large Aerospace Advances). IARIA, 27 – 32. https://nbn-resolving.org/urn:nbn:de:bsz:944-opus4- Software Systems. In ASCEND 2021. 4082. 10255 [21] Richard Wettel and Michele Lanza. 2008. Codecity: 3d visualization of large- [16] Ian Robinson, Jim Webber, and Emil Eifrem. 2015. Graph databases: new opportu- scale software. In Companion of the 30th international conference on Software nities for connected data. " O’Reilly Media, Inc.". engineering. 921–922. [17] Aashik Sadar and Vinitha Panicker. 2015. DocTool-a tool for visualizing soft- [22] Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shan- ware projects using graph database. In 2015 Eighth International Conference on ping Li. 2017. Measuring program comprehension: A large-scale field study with Contemporary Computing (IC3). IEEE, 439–442. professionals. IEEE Transactions on Software Engineering 44, 10 (2017), 951–976. [18] Mojtaba Shahin, Peng Liang, and Muhammad Ali Babar. 2014. A systematic [23] Jianghao Xiong, En-Lin Hsiang, Ziqian He, Tao Zhan, and Shin-Tson Wu. 2021. review of software architecture visualization techniques. Journal of Systems and Augmented reality and virtual reality displays: emerging technologies and future Software 94 (2014), 161–185. perspectives. Light: Science & Applications 10, 1 (2021), 1–30.