Authors Jeanne-Marie Musca Karl Cronburg Matthew P. Ahrens
License CC-BY-4.0
Markedly A cartographic approach for mapping eDSL implementation costs Matthew P. Ahrens Karl Cronburg Jeanne-Marie Musca Tufts University Tufts University Tufts University Medford, Massachusetts, USA Medford, Massachusetts, USA Medford, Massachusetts, USA mahrens@cs.tufts.edu karl@cs.tufts.edu jmusca@cs.tufts.edu Abstract 1.2 The Problem The cost of implementing an embedded domain specific lan- When a tool requires extraneous effort to implement an eDSL guage (eDSL) depends on the eDSL development tools used feature, the implementation incurs an unexpected cost. Costs to build it, but these costs are not readily apparent. Markedly are difficult to track across stages of language development enables developers to map elements of the eDSL implementa- and make comparing the impact of implementation decisions tion and attach costs to these elements. An implementation difficult with respect to the parts of the design those decisions diverging from its design incurs four distinct types of costs: affect. Preserve, Extend, Adhere, and Recognize. To illustrate the Markedly approach, we calculate the costs 1.3 The Solution of example eDSLs using a map for the Haskell language EDSL developers will better construct and modify their im- metaprogramming and host language toolsets. To evaluate plementation by: this approach, we will assess how developers reason about • Decomposing the design into eDSL features costs while implementing an eDSL. • Associating features with a tool implementation CCS Concepts • Software and its engineering → Gen- • Identifying connections between implementations eral programming languages; • Social and professional • Track costs accumulated across connections topics → History of programming languages; 2 Methodology Keywords programming languages, functional languages, Given a chosen deconstruction of an eDSL design into fea- domain specific languages, semantics tures, Markedly allows developers to calculate implementa- ACM Reference Format: tion costs. To illustrate, we apply Markedly techniques to a Matthew P. Ahrens, Karl Cronburg, and Jeanne-Marie Musca. 2017. toy eDSL for Identifiers with the features: Markedly: A cartographic approach for mapping eDSL implementa- • identifiers must match the regex [A-Z][a-z0-9]* tion costs. Presented at Workshop on Meta-Programming Techniques and Reflection (Meta’17). 5 pages. • identifiers must be unique in scope For tracking costs during any stage of development – plan- 1 Introduction ning, constructing, or iterating – Markedly describes an in- formal algorithm for visualizing an eDSL implementation as 1.1 The Practice a map. EDSL developers using Haskell’s language development toolset choose between host language constructs and metaprogram- 2.1 Constructing the Map ming tools when implementing their eDSLs. Host language Using Markedly, an eDSL developer constructs a map with constructs provide the eDSL developer direct reuse of the regions representing language development tools. The devel- core Haskell semantics and GHC’s extension constructs such oper marks points on a region when implementing part of as generics, type families, and data kinds. Haskell’s metapro- the language in that tool. When planning, developers put one gramming tools include the Template Haskell Q Monad, alge- point per feature on the map and connect points with edges braic data type AST representation, and libraries for parsing to represent program dataflow. In practice, developers may and compile time splicing[8, 10]. For simplicity, developers encounter instances where multiple tools may be needed for prefer to implement their eDSL using only one tool, but some one feature, resulting in multiple points and edges. In itera- eDSLs are better implemented by a composition of tools. tion, developers may add to or modify the implementation, resulting in adding or moving points and edges. This work is licensed under Creative Commons Attribution 4.0 International License (CC BY 4.0). For example, in Figures 1 and 2, the map partitions regions Meta’17, October 22, 2017, Vancouver, Canada for the tools: Haskell semantics and Parsec parsing. The de- © 2017 Copyright held by the owner/author(s). velopers for the Figure 1 implementation mark both features with one point which represents implementing identifiers 1 Meta’17, October 22, 2017, Vancouver, Canada Ahrens, Cronburg, and Musca as host language identifiers. In the Figure 2 implementation, Haskell developers mark the regex feature in the parser tool region as a parser implementation and the uniqueness feature in Haskell Ids the Haskell semantics region as a Haskell set data structure. To share program data between the features, developers add an edge from the parser to the set data structure. Figure 1. Haskell map for a host language embedding of 2.2 Metrics of cost Identifiers To help developers track the impact of their implementation decisions, Markedly accumulates costs along paths. To help Parsec Parser Haskell developers focus on specific concerns of their implemen- tation decisions, Markedly splits costs along four vectors, Parser Set labeled PEAR. 2.2.1 Preserve Figure 2. Haskell map for a metaprogram embedding of The heuristic for the preservation cost of a map element Identifiers (points and edges) is How many features in the design are compromised by this element? An implementation element compromises a design feature by not enforcing a property Parsec Parser Haskell of the design feature. In Figure 1, Haskell identifiers do not enforce the leading capital property of the regex feature, Parser Data.Set incurring a preservation cost of one. Print Stx Err 2.2.2 Extend Figure 3. Haskell map for a metaprogram embedding of The heuristic for the extension cost of a map element is How Identifiers with error reporting many intermediate representations are not accessible from this element? When the implementation of a map element does not expose transformed program data, then that element or an edge. When multiple edges converge on a point, the incurs an extension cost for each hidden representation. Sup- path takes the maximum cost for each metric at that point. pose the representation of identifiers to check syntax was In the implementation of Identifiers in Figure 3 which adds introduced and lost during error reporting in Figure 3. The error reporting, the accumulated cost along the parser edge error reporting point would incur an extension cost of one. and syntax edge are (0,0,0,1) and (0,1,0,2), and set edge cost is (0,0,0,0). Thus, (arдmax (0, 0, 1, 1)(0, 1, 0, 3)) + (0, 0, 0, 0)) 2.2.3 Adhere calculates the path cost as (0,1,1,3). The heuristic for the adherence cost of a map element is How Developers can now compare the costs of different imple- many expressions or syntactic annotations must be written in mentations with proper separation of concerns. The Iden- addition to simply implementing the feature? Complexity of tifiers eDSL implementations have costs of (1,0,0,0) for the the function required to convert between tool representa- host language implementation, (0,0,0,1) for the metaprogram tions informs this metric. For example, in Figure 3, the edge implementation, and (0,1,1,3) for the error handling imple- from the parser to the error correcting point must convert mentation. Using these metrics, developers decide which from the parser’s representation to a string thus incurring implementation to maintain depending on what will be dif- an adherence cost of one against the regex feature. ficult to maintain. Connecting the terminal points of maps allows developers to compose maps together and track costs 2.2.4 Recognize while iterating over their eDSLs. The heuristic for the recognition cost of a map element is Is this element present in the design? If the element is directly 3 Example eDSLs tied to at least one feature, it has a cost of zero; otherwise it 3.1 Data Parsing has a cost of one. In the example in Figure 3, error reporting Consider an eDSL, µPADS [5], whose features are: does not appear in the design, so the extra points and edges added to support that feature incur a cost of one, each. • best-effort (longest parse) left-to-right parsing • first-fit (packrat) parsing 2.3 Calculation and Comparison The implementation of (<|>) in Figure 4 takes two parser A four-tuple of natural numbers represents the cost for a results and decides which one progresses. The map in Figure map element : (P, E, A, R). A map element is either a point 5 defines the type signature, Nothing case, Both-Just case, 2 Markedly Meta’17, October 22, 2017, Vancouver, Canada 1 (<|>) :: Maybe [t] -> Maybe [t] -> Maybe [t] Parsec Parser Q Monad Haskell ADTs 2 (<|>) Nothing Nothing = Nothing Mun MkTy ADT 3 (<|>) (Just as) (Just bs) 4 | length as <= length bs = Just as 5 | otherwise = Just bs Host MkVal ValEnv 6 (<|>) (Just as) _ = Just as 7 (<|>) _ (Just bs) = Just bs Figure 7. The Haskell map for MunLang Figure 4. Representations of the two µPADS features. Fea- Element P E A R ture (1) is implemented by lines 4 and 5, and feature (2) is Host 0 0 1 0 implemented by lines 4, 6, and 7. ADT 0 0 0 1 ValEnv 0 0 0 1 Haskell One-Just MkTyMkV al 0 1 0 1 pm 3 MkTyADT 0 0 1 1 Both-Just TyConPath 0 0 2 2 pm 2 Nothing V alConPath 0 1 1 2 pm 1 Type Sig Figure 8. The non-zero PEAR costs of MunLang • propagate failure if no result Figure 5. Haskell map for µPADS • expect computation failure 3.2 Containers TySig P E A R Consider an eDSL MunLang that implements the multi-union Nothing 0 0 0 1 data structure [4] The design of MunLang contains three Both-Just 0 0 1 1 features: One-Just 0 0 0 0 • Define a munion type with {: :} operators • Construct a munion value with {: :} operators pm1 0 0 0 1 • Splice types and values from host language pm2 0 0 0 1 To define a munion type, specify a set of label-type pairs: pm3 0 0 0 1 typeM 1 = {: τ1 l 1 , τ2 l 2 , τ3 l 3 :}. Similarly, to construct munion values of type M 1 , declare a subset of label-value pairs,: V1 = TotalPath 0 0 1 5 {: l 1 = v 1 , l 3 = v 2 :} :: M 1 In the map in figure 7, the points mun and host enforce Figure 6. The non-zero PEAR costs of µPADS syntactic constraints for the munion and host language syn- tax, respectively. MkTy and MkVal points construct munion types and values, respectively. The ADT point represents and One-Just case(s) as points and connects them via the Haskell type splicing and ValEnv represents Haskell value standard Haskell tool of pattern matching (pm). splicing. Line 4 in Figure 4 simply exists to adhere to the implied se- The developers use the heuristics for each PEAR dimen- mantics of first-fit while implementing best-effort, incurring sion to calculate the PEAR costs in Figure . From these costs, an adherence cost. The type representation, Nothing failure the developers document that the implementation “fills in” case, and evaluation order imposed by pattern matching are the underspecified host language representation apparent in not reflected by features thus incurring a recognition cost. the recognition cost, contains superfluous code to parse the The design could reduce the PEAR cost to zero by being host language apparent in the adherence cost, and hides the more specific: type environment apparent in the extension cost. • best-effort (longest parse) left-to-right parsing pre- ferred 3.3 Type-Level Naturals • first-fit (packrat) parsing if results equivalent or only Consider an eDSL, Numeric Type Annotations whose features one result are: 3 Meta’17, October 22, 2017, Vancouver, Canada Ahrens, Cronburg, and Musca data Nat = Z | S Nat developers may add implementation details which make type Weighted (w :: Nat) a = a use of algebraic numerals, risking adherence costs if adding type family (w1 :: Nat) :+: (w2 :: Nat) :: Nat complicated parsing and printing expressions. type instance Z :+: m = m type instance (S n) :+: m = S (n :+: m) 4 Evaluation ($$) :: Weighted w1 (a -> b) -> Weighted w2 a To evaluate our cartographic approach, developers will im- -> Weighted (w1 :+: w2) b plement an eDSL in Haskell as part of a user study. ($$) f a = f a 4.1 Given Figure 9. The deeply embedded implementation for Type Support for an overspecified eDSL feature evaluates preser- Level Naturals vation. The amount of code modified when adding a fea- ture after initial implementation evaluates extension. The Haskell number of intermediate representations evaluates adherence. Natural ADT $$ Abstracting the design from the implementation evaluates recognition. Weighted Type 4.2 Experiment Half of participating developers will use the Haskell map that contains the maps from the eDSL examples and the costs TyVar :+: of known paths between regions. Each developer will place Data Kinds Type Family eDSL features on the map and calculate the accumulated PEAR costs while iterating over their implementation. Figure 10. The Haskell map for Type Level Naturals 5 Current Work and Conclusion Element P E A R To bridge the gap between developer knowledge and auto- matic tool support, an artifact of Markedly would provide Natural ADT 1 0 0 0 an interactive view where: Weighted Type Alias 0 0 1 0 • A specification language encodes the design TyVar 0 0 0 1 • PEAR costs are automatically calculated :+: 0 0 1 0 • A library provides common eDSL map elements N atTyV ar 0 0 0 1 • Code templates are generated from a map instance TyV arW eiдht 0 0 0 1 Language design workbenches associated with edges on a Markedly map by proving a unified representation for TyV ar : + : 0 0 0 1 constructing an eDSL. Developers can apply Markedly to ProдPath 1 0 2 3 language design workbench eDSL implementations by rep- resenting the workbench as a large region on the map and focusing on the features that require tools outside the work- Figure 11. The non-zero PEAR costs of Type Level Naturals bench as points and edges that leave the region. The realities of implementing an eDSL incur various kinds • readable natural number weight types of costs dependent on the stage of development. In the Markedly • attach weights to arbitrary expressions approach, we categorize these costs into the four PEAR • summing weights across function application metrics. Markedly describes costs for eDSL implementation A deep embedding represents the eDSL using the Haskell tools and our Haskell eDSL examples provide insight into type system in Figure 9. The map in Figure 10 for this eDSL instances of the approach. A user study would allow us to contains the regions of data kinds, type families and Haskell determine if the approach and Haskell specific artifacts facil- semantics. itate reasoning about implementation choices with respect In Figure 11, the implementation manifests a connection to PEAR costs. between numeric representation and weights incurring a recognition cost. Because the implementation must thread Acknowledgments the type variable, w :: N at, elements adding extra type vari- Markedly is sponsored by the Defense Advanced Research able annotations in addition to their features incur an ad- Projects Agency (contract: FA8750-15-2-0033). Markedly does herence cost. The preservation cost occurs from the natural not necessarily reflect the position or the policy of the US numbers Peano representation being difficult to read. The Government, and no official endorsement should be inferred. 4 Markedly Meta’17, October 22, 2017, Vancouver, Canada References [1] Walter Cazzola and Albert Shaqiri. 2016. Modularity and Optimiza- tion in Synergy. In Proceedings of the 15th International Conference on Modularity (MODULARITY 2016). ACM, New York, NY, USA, 70–81. https://doi.org/10.1145/2889443.2889445 [2] Walter Cazzola and Ivan Speziale. 2009. Sectional Domain Specific Languages. In Proceedings of the 4th Workshop on Domain-specific Aspect Languages (DSAL ’09). ACM, New York, NY, USA, 11–14. https: //doi.org/10.1145/1509307.1509311 [3] Martin Churchill, Peter D. Mosses, and Paolo Torrini. 2014. Reusable Components of Semantic Specifications. In Proceedings of the 13th International Conference on Modularity (MODULARITY ’14). ACM, New York, NY, USA, 145–156. https://doi.org/10.1145/2577080.2577099 [4] Corinna Cortes, Kathleen Fisher, Daryl Pregibon, Anne Rogers, and Frederick Smith. 2004. Hancock: A Language for Analyzing Transac- tional Data Streams. ACM Trans. Program. Lang. Syst. 26, 2 (March 2004), 301–338. https://doi.org/10.1145/973097.973100 [5] Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-specific Language for Processing Ad Hoc Data. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Im- plementation (PLDI ’05). ACM, New York, NY, USA, 295–304. https: //doi.org/10.1145/1065010.1065046 [6] Vojin Jovanovic, Amir Shaikhha, Sandro Stucki, Vladimir Nikolaev, Christoph Koch, and Martin Odersky. 2014. Yin-yang: Conceal- ing the Deep Embedding of DSLs. In Proceedings of the 2014 Inter- national Conference on Generative Programming: Concepts and Ex- periences (GPCE 2014). ACM, New York, NY, USA, 73–82. https: //doi.org/10.1145/2658761.2658771 [7] Thomas Kühn, Walter Cazzola, and Diego Mathias Olivares. 2015. Choosy and Picky: Configuration of Language Product Lines. In Pro- ceedings of the 19th International Conference on Software Product Line (SPLC ’15). ACM, New York, NY, USA, 71–80. https://doi.org/10.1145/ 2791060.2791092 [8] Geoffrey Mainland. 2007. Why It’s Nice to Be Quoted: Quasiquoting for Haskell. In Proceedings of the ACM SIGPLAN Workshop on Haskell Workshop (Haskell ’07). ACM, New York, NY, USA, 73–82. https://doi. org/10.1145/1291201.1291211 [9] Jan Midtgaard, Norman Ramsey, and Bradford Larsen. 2013. Engineer- ing Definitional Interpreters. In Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming (PPDP ’13). ACM, New York, NY, USA, 121–132. https://doi.org/10.1145/2505879.2505894 [10] Tim Sheard and Simon Peyton Jones. 2002. Template Meta- programming for Haskell. In Proceedings of the 2002 ACM SIGPLAN Workshop on Haskell (Haskell ’02). ACM, New York, NY, USA, 1–16. https://doi.org/10.1145/581690.581691 5