Authors Aladdin Shikhizada Fabrizio Maria Maggi Francesco Riva Ivan Donadello
License CC-BY-4.0
Declare4Py: A Python Library for Declarative Process Mining Ivan Donadello1,∗ , Ph.D., Francesco Riva1 , M.S., Fabrizio Maria Maggi1 , Ph.D. and Aladdin Shikhizada2 , M.S. 1 Free University of Bozen-Bolzano, Bolzano, Italy 2 Visioncraft OÜ, Tallin, Estonia Abstract In process mining, procedural process models can be difficult to manage when the process is unpredictable and characterized by many possible exceptions since they can easily become unreadable. Declarative languages, instead, model the process by imposing logical constraints on the process behavior and are suitable to represent variable processes in a compact way. Declare is the reference declarative language in the BPM community. Although several Java tools are available for process analysis based on Declare, a library implementing process mining tasks with Declare in Python is still missing. Therefore, in this paper, we present Declare4Py, the first Python package that offers support for declarative process mining. Declare4Py includes methods for conformance checking, process discovery and query checking. Keywords Declare, Conformance Checking, Process Discovery, Query Checking, Python API, Declarative Process Mining 1. Introduction Process mining [1] focuses on the analysis of business processes based on event logs that contain information about the process executions. A key component in process mining is a process model that is a formal representation of the process in a standard format. Procedural models require to define the whole control-flow of the process step-by-step thus making procedural process mining not suitable for processes with a high number of different branches and exceptions. Declarative process models are, instead, easier to manage as they just encode a set of constraints that the process should follow. While procedural models can be designed and analyzed using several available commercial and academic process mining tools12345 , this variety is lacking for The BPM 2022 Demos & Resources Forum. 20th Business Process Management Conference, Münster, Germany, September 11-16, 2022 ∗ Corresponding author. Envelope-Open ivan.donadello@unibz.it (I. Donadello); Francesco.Riva@unibz.it (F. Riva); maggi@inf.unibz.it (F. M. Maggi); aladdin.shikhizada@gmail.com (A. Shikhizada) Orcid 0000-0002-0701-5729 (I. Donadello); 0000-0002-9089-6896 (F. M. Maggi) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 1 apromore.com/ 2 celonis.com/ 3 fluxicon.com/disco/ 4 processmining.org 5 pm4py.fit.fraunhofer.de/ 117 Ivan Donadello et al. CEUR Workshop Proceedings declarative models where only few (Java-based) tools and libraries are available6 [2, 3]. In this paper, we present Declare4Py, a novel and easy-to-use Python library that implements a set of APIs covering the main declarative process mining tasks based on Declare [4]. In particular, the language employed is MP-Declare [5], the multi-perspective extension of Declare that supports data- and time-aware constraints along with control-flow constraints. Our contribution to the BPM community is the first Python library with APIs for conformance checking, process discovery and query checking based on Declare models. Being a Python library, Declare4Py can be easily integrated with the main Machine Learning frameworks such as SKlearn, Tensorflow and Pytorch, and, as a library, can be easily invoked via code to conduct large experimentations. We also stress the fact that the query checking functionality provided by Declare4Py is novel and not available in the existing tools for declarative process mining. We compared the Declare4Py performance with RuM [2], a Java-based tool for declarative process mining on the core task of conformance checking, achieving better computational times. This first release of Declare4Py is online in a GitHub repository available at https://github. com/francxx96/declare4py. The repository contains the code and some tutorials in Jupyter notebooks (https://github.com/francxx96/declare4py/tree/main/tutorials) showing how to use Declare4Py using the well-known Sepsis cases log.7 A video that overviews the package is available at https://youtu.be/hJhgoqFLM7s. 2. Overview of the Declare4Py Features Declare4Py has been designed to analyze event logs using declarative, constraint-based process models. It relies on well-known standards for input and output files, such as XES [6] for event logs and d e c l [7] for the Declare models. This ensures its interoperability with other libraries and tools. We briefly recall here some preliminary definitions. A trace 𝜎 is an execution of a business process. A trace contains a sequence of events where each event is related to the execution of an activity 𝑎 ∈ 𝐴 (with 𝐴 the set of all possible activities), performed at time 𝑡 with a (possible) set of other attributes a.k.a. the payload of the event. A Declare model ℳ = {𝜑1 , 𝜑2 , …} is a set of Declare constraints instantiation of parameterized templates [4]. We indicate the set of Declare templates with 𝒜. A trace satisfies a Declare model (𝜎 ⊧ ℳ), when the trace satisfies each constraint 𝜑 ∈ ℳ, i.e., ∀𝜑 ∈ ℳ, 𝜎 ⊧ 𝜑. A log 𝐿 is a multi-set of traces. Conformance Checking. Given a log 𝐿 of traces 𝜎𝑖 and an MP-Declare model ℳ, the con- formance checking task checks, for all the traces 𝜎𝑖 ∈ 𝐿, whether, for all constraints 𝜑 ∈ ℳ, 𝜎𝑖 ⊧ 𝜑 holds. Declare4Py implements the conformance checking task using the approach presented in [5] that takes an MP-Declare model and a log as inputs and returns the number of activations, fulfillments, and violations for each constraint in the input model and for each trace in the input log. These results are listed in a Python data structure indexed by trace identifier. Therefore, the user can easily query such data structure to retrieve or aggregate information. 6 rulemining.org 7 https://data.4tu.nl/articles/dataset/Sepsis_Cases_-_Event_Log/12707639 118 Ivan Donadello et al. CEUR Workshop Proceedings Process Discovery. Given a log 𝐿 of traces 𝜎𝑖 and a support threshold 𝑡ℎ𝑠 , the process dis- covery task returns a Declare8 model ℳ of constraints satisfied by a percentage of traces in 𝐿 higher than or equal to 𝑡ℎ𝑠 . More formally: ℳ = {𝜑 ∶ |{𝜎 ∈ 𝐿 ∶ 𝜎 ⊧ 𝜑}|/|𝐿| ≥ 𝑡ℎ𝑠 }. (1) Declare4Py implements the approach presented in [8] that consists of two steps, i.e., (1) the discovery of frequent (pairs of) activities from 𝐿; (2) the construction of ℳ from this set. The set of frequent (pairs of) activities from 𝐿 is built with the Apriori algorithm [9] by computing the frequent itemsets of activities of length 1 and 2. These itemsets are used to build a set of candidate Declare constraints 𝒞 obtained by instantiating the templates in 𝒜 with the activities belonging to each itemset. ℳ ⊆ 𝒞 is then computed by selecting the constraints 𝜑 ∈ 𝒞 such that |{𝜎 ∈ 𝐿 ∶ 𝜎 ⊧ 𝜑}|/|𝐿| ≥ 𝑡ℎ𝑠 . The results are returned in a Python data structure containing, for each constraint in ℳ, the traces that satisfy it. A Declare4Py function allows the user to filter such data structure to retrieve the most relevant (i.e., the most frequently satisfied) constraints. The discovered model can be exported as a d e c l file. Query Checking. This task takes as input a log 𝐿 of traces 𝜎𝑖 , a support threshold 𝑡ℎ𝑠 , and an MP-Declare query 𝑞, i.e., an MP-Declare constraint in which the activation and/or the target activity are unspecified. For example, constraint Response(?𝐴, ER Triage)9 contains a placeholder for the activation activity, whereas Response(?𝐴, ?𝑇) contains placeholders for both activation and target. Let 𝑉 𝑎𝑟𝑠 be the set of placeholders of a Declare query and 𝜆 ∶ 𝑉 𝑎𝑟𝑠 → 𝐴 be an assignment function that assigns placeholders to activities. The query checking task returns the set of assignments Λ = {𝜆1 , 𝜆2 , …} such that the input query 𝑞 instantiated using the assignments in Λ is satisfied by a percentage of traces in 𝐿 higher than or equal to 𝑡ℎ𝑠 . More formally: Λ = {𝜆𝑖 ∶ |{𝜎 ∈ 𝐿 ∶ 𝜎 ⊧ 𝑞[𝜆𝑖 ]}|/|𝐿| ≥ 𝑡ℎ𝑠 }. (2) Declare4Py returns a data structure containing the assignments. 3. Performance We tested the computational time performance of Declare4Py on the above tasks under different conditions using the Sepsis cases log7 and the log provided for the annual Business Process Intelligence Challenge (BPIC) in 202010 . The core task is conformance checking as process discovery and query checking are built on top of it. Therefore, we compared the conformance checking task (based on [5]) implemented both in Declare4Py and in RuM [2], increasing the number of Declare constraints in the input model. The performance of the discovery and the query checking tasks is, instead, computed for different support values ranging in the set 8 The process discovery functionality, differently from the conformance checking and the query checking tasks, is data-agnostic. 9 For simplicity, we do not define data and time conditions in this example. However, fully defined data and time conditions can be specified in the query. 10 http://icpmconference.org/2020/wp-content/uploads/sites/4/2020/03/InternationalDeclarations.xes.gz 119 Ivan Donadello et al. CEUR Workshop Proceedings Conformance checking - Sepsis log Model discovery - Sepsis log Query checking - Sepsis log 4 RuM 1 variable 30 2.0 2 variables Declare4Py 3 25 1.5 Time [s] Time [s] Time [s] 2 1.0 20 1 0.5 15 0.0 20 30 40 50 60 70 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Number of model constraints Itemset support Declare constraint support Conformance checking - BPIC 2020 log Model discovery - BPIC 2020 log Query checking - BPIC 2020 log 50 1 variable RuM 180 Declare4Py 2 variables 80 40 160 60 30 140 Time [s] Time [s] Time [s] 40 120 20 100 20 10 80 0 0 20 30 40 50 60 70 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Number of model constraints Itemset support Declare constraint support Figure 1: Declare4Py shows better performance than RuM for conformance checking for both the Sepsis log (above) and the BPIC 2020 log (below). {0.2, 0.4, 0.6, 0.8}.11 For query checking, we used the Chain Response template and performed two tests. In the first one, we fixed the activation activity leaving the target unspecified; in the second test, we left both the activation and the target activities unspecified. The results of our experiments are reported in Figure 1. Declare4Py presents slightly lower computational times with respect to RuM for conformance checking on small models. However, as the model grows in the number of constraints, the computational times diverge. This is particularly evident in the BPIC 2020 case. The computational time for the discovery and the query checking tasks decreases when the support increases, since a higher support implies less candidates to check. The computational time for the query checking task is obviously higher when two placeholders have to be assigned. 4. Maturity and Future Remarks Declare4Py has been used for deviance mining in [11] and as a tool for a new feature encoding for business process analysis using Machine Learning methods [12]. This first release can be improved both in terms of performance and number of functionalities. As future work, we plan to increase the Declare4Py performance by implementing optimization techniques, such as multi-threading, and by using the Numba library,12 which translates, at runtime, Python code into optimized machine code by using the industry-standard LLVM [13]. The Declare4Py functionalities will be improved by including state-of-the-art process mining algorithms. For 11 In this case, a comparison with RuM would not be fair as this tool implements the optimization technique based on multi-threading, presented in [10], which is currently not developed in Declare4Py. 12 https://numba.pydata.org/ 120 Ivan Donadello et al. CEUR Workshop Proceedings conformance checking, we plan to include the techniques presented in [14] and in [15], while, for process discovery, we will implement the techniques introduced in [16] and [17]. Acknowledgments The work of Francesco Riva is supported by the UNIBZ project PRISMA. References [1] W. M. P. van der Aalst, Process Mining - Data Science in Action, Springer, 2016. [2] A. Alman, C. Di Ciccio, D. Haas, F. M. Maggi, A. Nolte, Rule mining with RuM, in: 2nd International Conference on Process Mining, ICPM 2020, 2020, pp. 121–128. [3] C. Di Ciccio, M. Mecella, On the discovery of declarative control flows for artful processes, ACM Trans. Manag. Inf. Syst. 5 (2015) 24:1–24:37. [4] M. Pesic, H. Schonenberg, W. M. P. van der Aalst, DECLARE: full support for loosely- structured processes, in: EDOC, IEEE Computer Society, 2007, pp. 287–300. [5] A. Burattin, F. M. Maggi, A. Sperduti, Conformance checking based on multi-perspective declarative process models, Expert Syst. Appl. 65 (2016) 194–211. [6] C. W. Gunther, H. Verbeek, XES-standard definition (2014). [7] V. Skydanienko, C. Di Francescomarino, C. Ghidini, F. M. Maggi, A tool for generating event logs from multi-perspective declare models, in: BPM (Dissertation/Demos/Industry), volume 2196 of CEUR Workshop Proceedings, CEUR-WS.org, 2018, pp. 111–115. [8] F. M. Maggi, R. P. J. C. Bose, W. M. P. van der Aalst, Efficient discovery of understandable declarative process models from event logs, in: CAiSE, 2012, pp. 270–285. [9] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: VLDB, 1994, pp. 487–499. [10] F. M. Maggi, C. Di Ciccio, C. Di Francescomarino, T. Kala, Parallel algorithms for the automated discovery of declarative process models, Inf. Syst. 74 (2018) 136–152. [11] G. Bergami, C. Di Francescomarino, C. Ghidini, F. M. Maggi, J. Puura, Exploring business process deviance with sequential and declarative patterns, CoRR abs/2111.12454 (2021). [12] C. Di Francescomarino, I. Donadello, C. Ghidini, F. M. Maggi, W. Rizzi, Making sense of temporal data: the DECLARE encoding, in: PMAI@IJCAI, CEUR-WS.org, 2022. [13] J. Lee, C. Hur, R. Jung, Z. Liu, J. Regehr, N. P. Lopes, Reconciling high-level optimizations and low-level code in LLVM, Proc. ACM Program. Lang. 2 (2018) 125:1–125:28. [14] M. de Leoni, F. M. Maggi, W. M. P. van der Aalst, Aligning event logs and declarative process models for conformance checking, in: BPM, 2012, pp. 82–97. [15] G. Bergami, F. M. Maggi, A. Marrella, M. Montali, Aligning data-aware declarative process models and event logs, in: BPM, Springer, 2021, pp. 235–251. [16] C. Di Ciccio, F. M. Maggi, J. Mendling, Efficient discovery of target-branched Declare constraints, Inf. Syst. 56 (2016) 258–283. [17] V. Leno, M. Dumas, F. M. Maggi, M. La Rosa, A. Polyvyanyy, Automated discovery of declarative process models with correlated data conditions, Inf. Syst. 89 (2020) 101482. 121