DOKK Library
                  A cartographic approach for mapping eDSL implementation costs
            Matthew P. Ahrens                                     Karl Cronburg                             Jeanne-Marie Musca
             Tufts University                                    Tufts University                              Tufts University
        Medford, Massachusetts, USA                        Medford, Massachusetts, USA                    Medford, Massachusetts, USA

Abstract                                                                         1.2       The Problem
The cost of implementing an embedded domain specific lan-                        When a tool requires extraneous effort to implement an eDSL
guage (eDSL) depends on the eDSL development tools used                          feature, the implementation incurs an unexpected cost. Costs
to build it, but these costs are not readily apparent. Markedly                  are difficult to track across stages of language development
enables developers to map elements of the eDSL implementa-                       and make comparing the impact of implementation decisions
tion and attach costs to these elements. An implementation                       difficult with respect to the parts of the design those decisions
diverging from its design incurs four distinct types of costs:                   affect.
Preserve, Extend, Adhere, and Recognize.
   To illustrate the Markedly approach, we calculate the costs                   1.3       The Solution
of example eDSLs using a map for the Haskell language                            EDSL developers will better construct and modify their im-
metaprogramming and host language toolsets. To evaluate                          plementation by:
this approach, we will assess how developers reason about                              •   Decomposing the design into eDSL features
costs while implementing an eDSL.                                                      •   Associating features with a tool implementation
CCS Concepts • Software and its engineering → Gen-                                     •   Identifying connections between implementations
eral programming languages; • Social and professional                                  •   Track costs accumulated across connections
topics → History of programming languages;
                                                                                 2     Methodology
Keywords programming languages, functional languages,
                                                                                 Given a chosen deconstruction of an eDSL design into fea-
domain specific languages, semantics
                                                                                 tures, Markedly allows developers to calculate implementa-
ACM Reference Format:                                                            tion costs. To illustrate, we apply Markedly techniques to a
Matthew P. Ahrens, Karl Cronburg, and Jeanne-Marie Musca. 2017.                  toy eDSL for Identifiers with the features:
Markedly: A cartographic approach for mapping eDSL implementa-
                                                                                       • identifiers must match the regex [A-Z][a-z0-9]*
tion costs. Presented at Workshop on Meta-Programming Techniques
and Reflection (Meta’17). 5 pages.                                                     • identifiers must be unique in scope
                                                                                 For tracking costs during any stage of development – plan-
1    Introduction                                                                ning, constructing, or iterating – Markedly describes an in-
                                                                                 formal algorithm for visualizing an eDSL implementation as
1.1 The Practice
                                                                                 a map.
EDSL developers using Haskell’s language development toolset
choose between host language constructs and metaprogram-                         2.1       Constructing the Map
ming tools when implementing their eDSLs. Host language                          Using Markedly, an eDSL developer constructs a map with
constructs provide the eDSL developer direct reuse of the                        regions representing language development tools. The devel-
core Haskell semantics and GHC’s extension constructs such                       oper marks points on a region when implementing part of
as generics, type families, and data kinds. Haskell’s metapro-                   the language in that tool. When planning, developers put one
gramming tools include the Template Haskell Q Monad, alge-                       point per feature on the map and connect points with edges
braic data type AST representation, and libraries for parsing                    to represent program dataflow. In practice, developers may
and compile time splicing[8, 10]. For simplicity, developers                     encounter instances where multiple tools may be needed for
prefer to implement their eDSL using only one tool, but some                     one feature, resulting in multiple points and edges. In itera-
eDSLs are better implemented by a composition of tools.                          tion, developers may add to or modify the implementation,
                                                                                 resulting in adding or moving points and edges.
This work is licensed under Creative Commons Attribution 4.0 International
License (CC BY 4.0).
                                                                                    For example, in Figures 1 and 2, the map partitions regions
Meta’17, October 22, 2017, Vancouver, Canada                                     for the tools: Haskell semantics and Parsec parsing. The de-
© 2017 Copyright held by the owner/author(s).                                    velopers for the Figure 1 implementation mark both features
                                                                                 with one point which represents implementing identifiers
Meta’17, October 22, 2017, Vancouver, Canada                                                             Ahrens, Cronburg, and Musca

as host language identifiers. In the Figure 2 implementation,                                        Haskell
developers mark the regex feature in the parser tool region
as a parser implementation and the uniqueness feature in                                           Haskell Ids
the Haskell semantics region as a Haskell set data structure.
To share program data between the features, developers add
an edge from the parser to the set data structure.
                                                                        Figure 1. Haskell map for a host language embedding of
2.2     Metrics of cost                                                 Identifiers
To help developers track the impact of their implementation
decisions, Markedly accumulates costs along paths. To help                                Parsec Parser       Haskell
developers focus on specific concerns of their implemen-
tation decisions, Markedly splits costs along four vectors,                                   Parser             Set
labeled PEAR.

2.2.1    Preserve
                                                                        Figure 2. Haskell map for a metaprogram embedding of
The heuristic for the preservation cost of a map element
(points and edges) is How many features in the design are
compromised by this element? An implementation element
compromises a design feature by not enforcing a property                                  Parsec Parser       Haskell
of the design feature. In Figure 1, Haskell identifiers do not
enforce the leading capital property of the regex feature,                                    Parser         Data.Set
incurring a preservation cost of one.
                                                                                           Print Stx Err
2.2.2    Extend
                                                                        Figure 3. Haskell map for a metaprogram embedding of
The heuristic for the extension cost of a map element is How
                                                                        Identifiers with error reporting
many intermediate representations are not accessible from this
element? When the implementation of a map element does
not expose transformed program data, then that element                  or an edge. When multiple edges converge on a point, the
incurs an extension cost for each hidden representation. Sup-           path takes the maximum cost for each metric at that point.
pose the representation of identifiers to check syntax was              In the implementation of Identifiers in Figure 3 which adds
introduced and lost during error reporting in Figure 3. The             error reporting, the accumulated cost along the parser edge
error reporting point would incur an extension cost of one.             and syntax edge are (0,0,0,1) and (0,1,0,2), and set edge cost
                                                                        is (0,0,0,0). Thus, (arдmax (0, 0, 1, 1)(0, 1, 0, 3)) + (0, 0, 0, 0))
2.2.3    Adhere                                                         calculates the path cost as (0,1,1,3).
The heuristic for the adherence cost of a map element is How               Developers can now compare the costs of different imple-
many expressions or syntactic annotations must be written in            mentations with proper separation of concerns. The Iden-
addition to simply implementing the feature? Complexity of              tifiers eDSL implementations have costs of (1,0,0,0) for the
the function required to convert between tool representa-               host language implementation, (0,0,0,1) for the metaprogram
tions informs this metric. For example, in Figure 3, the edge           implementation, and (0,1,1,3) for the error handling imple-
from the parser to the error correcting point must convert              mentation. Using these metrics, developers decide which
from the parser’s representation to a string thus incurring             implementation to maintain depending on what will be dif-
an adherence cost of one against the regex feature.                     ficult to maintain. Connecting the terminal points of maps
                                                                        allows developers to compose maps together and track costs
2.2.4    Recognize                                                      while iterating over their eDSLs.
The heuristic for the recognition cost of a map element is Is
this element present in the design? If the element is directly          3     Example eDSLs
tied to at least one feature, it has a cost of zero; otherwise it       3.1   Data Parsing
has a cost of one. In the example in Figure 3, error reporting
                                                                        Consider an eDSL, µPADS [5], whose features are:
does not appear in the design, so the extra points and edges
added to support that feature incur a cost of one, each.                    • best-effort (longest parse) left-to-right parsing
                                                                            • first-fit (packrat) parsing
2.3     Calculation and Comparison                                         The implementation of (<|>) in Figure 4 takes two parser
A four-tuple of natural numbers represents the cost for a               results and decides which one progresses. The map in Figure
map element : (P, E, A, R). A map element is either a point             5 defines the type signature, Nothing case, Both-Just case,
    Markedly                                                                              Meta’17, October 22, 2017, Vancouver, Canada

1   (<|>) :: Maybe [t] -> Maybe [t] -> Maybe [t]                                      Parsec Parser      Q Monad Haskell ADTs
2   (<|>) Nothing Nothing = Nothing                                                       Mun             MkTy      ADT
3   (<|>) (Just as) (Just bs)
4     | length as <= length bs = Just as
5     | otherwise               = Just bs                                                  Host            MkVal             ValEnv
6   (<|>) (Just as) _ = Just as
7   (<|>) _ (Just bs) = Just bs                                                       Figure 7. The Haskell map for MunLang

    Figure 4. Representations of the two µPADS features. Fea-                                   Element         P   E    A     R
    ture (1) is implemented by lines 4 and 5, and feature (2) is                                  Host          0    0   1     0
    implemented by lines 4, 6, and 7.                                                             ADT           0    0   0     1
                                                                                                  ValEnv        0    0   0     1
                           Haskell             One-Just                                      MkTyMkV al         0    1   0     1
                                                      pm 3                                    MkTyADT           0    0   1     1
                                                                                              TyConPath         0    0   2     2
                                           pm 2
                           Nothing                                                           V alConPath        0    1   1     2
                                pm 1
                    Type Sig                                                      Figure 8. The non-zero PEAR costs of MunLang

                                                                                 • propagate failure if no result
                 Figure 5. Haskell map for µPADS                                 • expect computation failure

                                                                           3.2    Containers
                        TySig          P   E      A    R
                                                                           Consider an eDSL MunLang that implements the multi-union
                      Nothing          0   0      0     1                  data structure [4] The design of MunLang contains three
                      Both-Just        0   0      1     1                  features:
                      One-Just         0   0      0     0                         • Define a munion type with {: :} operators
                                                                                  • Construct a munion value with {: :} operators
                        pm1            0   0      0     1
                                                                                  • Splice types and values from host language
                        pm2            0   0      0     1                  To define a munion type, specify a set of label-type pairs:
                        pm3            0   0      0     1                  typeM 1 = {: τ1 l 1 , τ2 l 2 , τ3 l 3 :}. Similarly, to construct munion
                                                                           values of type M 1 , declare a subset of label-value pairs,: V1 =
                     TotalPath         0   0      1     5                  {: l 1 = v 1 , l 3 = v 2 :} :: M 1
                                                                                In the map in figure 7, the points mun and host enforce
           Figure 6. The non-zero PEAR costs of µPADS                      syntactic constraints for the munion and host language syn-
                                                                           tax, respectively. MkTy and MkVal points construct munion
                                                                           types and values, respectively. The ADT point represents
    and One-Just case(s) as points and connects them via the               Haskell type splicing and ValEnv represents Haskell value
    standard Haskell tool of pattern matching (pm).                        splicing.
       Line 4 in Figure 4 simply exists to adhere to the implied se-            The developers use the heuristics for each PEAR dimen-
    mantics of first-fit while implementing best-effort, incurring         sion to calculate the PEAR costs in Figure . From these costs,
    an adherence cost. The type representation, Nothing failure            the developers document that the implementation “fills in”
    case, and evaluation order imposed by pattern matching are             the underspecified host language representation apparent in
    not reflected by features thus incurring a recognition cost.           the recognition cost, contains superfluous code to parse the
       The design could reduce the PEAR cost to zero by being              host language apparent in the adherence cost, and hides the
    more specific:                                                         type environment apparent in the extension cost.
        • best-effort (longest parse) left-to-right parsing pre-
           ferred                                                          3.3    Type-Level Naturals
        • first-fit (packrat) parsing if results equivalent or only        Consider an eDSL, Numeric Type Annotations whose features
           one result                                                      are:
Meta’17, October 22, 2017, Vancouver, Canada                                                      Ahrens, Cronburg, and Musca

  data Nat = Z | S Nat                                               developers may add implementation details which make
  type Weighted (w :: Nat) a = a                                     use of algebraic numerals, risking adherence costs if adding
  type family (w1 :: Nat) :+: (w2 :: Nat) :: Nat                     complicated parsing and printing expressions.
  type instance Z :+: m = m
  type instance (S n) :+: m = S (n :+: m)                            4     Evaluation
  ($$) :: Weighted w1 (a -> b) -> Weighted w2 a                      To evaluate our cartographic approach, developers will im-
       -> Weighted (w1 :+: w2) b                                     plement an eDSL in Haskell as part of a user study.
  ($$) f a = f a
                                                                     4.1   Given
Figure 9. The deeply embedded implementation for Type                Support for an overspecified eDSL feature evaluates preser-
Level Naturals                                                       vation. The amount of code modified when adding a fea-
                                                                     ture after initial implementation evaluates extension. The
                         Haskell                                     number of intermediate representations evaluates adherence.
                Natural ADT      $$                                  Abstracting the design from the implementation evaluates
                       Weighted Type
                                                                     4.2   Experiment
                                                                     Half of participating developers will use the Haskell map
                                                                     that contains the maps from the eDSL examples and the costs
                    TyVar           :+:
                                                                     of known paths between regions. Each developer will place
                 Data Kinds Type Family                              eDSL features on the map and calculate the accumulated
                                                                     PEAR costs while iterating over their implementation.
   Figure 10. The Haskell map for Type Level Naturals
                                                                     5     Current Work and Conclusion
                  Element           P     E   A   R                  To bridge the gap between developer knowledge and auto-
                                                                     matic tool support, an artifact of Markedly would provide
              Natural ADT           1     0   0   0                  an interactive view where:
           Weighted Type Alias      0     0   1   0
                                                                         • A specification language encodes the design
                 TyVar              0     0   0   1                      • PEAR costs are automatically calculated
                   :+:              0     0   1   0                      • A library provides common eDSL map elements
               N atTyV ar           0     0   0   1                      • Code templates are generated from a map instance
               TyV arW eiдht        0     0   0   1                     Language design workbenches associated with edges on
                                                                     a Markedly map by proving a unified representation for
                TyV ar : + :        0     0   0   1
                                                                     constructing an eDSL. Developers can apply Markedly to
                 ProдPath           1     0   2   3                  language design workbench eDSL implementations by rep-
                                                                     resenting the workbench as a large region on the map and
                                                                     focusing on the features that require tools outside the work-
Figure 11. The non-zero PEAR costs of Type Level Naturals            bench as points and edges that leave the region.
                                                                        The realities of implementing an eDSL incur various kinds
    • readable natural number weight types                           of costs dependent on the stage of development. In the Markedly
    • attach weights to arbitrary expressions                        approach, we categorize these costs into the four PEAR
    • summing weights across function application                    metrics. Markedly describes costs for eDSL implementation
A deep embedding represents the eDSL using the Haskell               tools and our Haskell eDSL examples provide insight into
type system in Figure 9. The map in Figure 10 for this eDSL          instances of the approach. A user study would allow us to
contains the regions of data kinds, type families and Haskell        determine if the approach and Haskell specific artifacts facil-
semantics.                                                           itate reasoning about implementation choices with respect
  In Figure 11, the implementation manifests a connection            to PEAR costs.
between numeric representation and weights incurring a
recognition cost. Because the implementation must thread             Acknowledgments
the type variable, w :: N at, elements adding extra type vari-       Markedly is sponsored by the Defense Advanced Research
able annotations in addition to their features incur an ad-          Projects Agency (contract: FA8750-15-2-0033). Markedly does
herence cost. The preservation cost occurs from the natural          not necessarily reflect the position or the policy of the US
numbers Peano representation being difficult to read. The            Government, and no official endorsement should be inferred.
Markedly                                                                           Meta’17, October 22, 2017, Vancouver, Canada

 [1] Walter Cazzola and Albert Shaqiri. 2016. Modularity and Optimiza-
     tion in Synergy. In Proceedings of the 15th International Conference on
     Modularity (MODULARITY 2016). ACM, New York, NY, USA, 70–81.
 [2] Walter Cazzola and Ivan Speziale. 2009. Sectional Domain Specific
     Languages. In Proceedings of the 4th Workshop on Domain-specific
     Aspect Languages (DSAL ’09). ACM, New York, NY, USA, 11–14. https:
 [3] Martin Churchill, Peter D. Mosses, and Paolo Torrini. 2014. Reusable
     Components of Semantic Specifications. In Proceedings of the 13th
     International Conference on Modularity (MODULARITY ’14). ACM, New
     York, NY, USA, 145–156.
 [4] Corinna Cortes, Kathleen Fisher, Daryl Pregibon, Anne Rogers, and
     Frederick Smith. 2004. Hancock: A Language for Analyzing Transac-
     tional Data Streams. ACM Trans. Program. Lang. Syst. 26, 2 (March
     2004), 301–338.
 [5] Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-specific
     Language for Processing Ad Hoc Data. In Proceedings of the 2005
     ACM SIGPLAN Conference on Programming Language Design and Im-
     plementation (PLDI ’05). ACM, New York, NY, USA, 295–304. https:
 [6] Vojin Jovanovic, Amir Shaikhha, Sandro Stucki, Vladimir Nikolaev,
     Christoph Koch, and Martin Odersky. 2014. Yin-yang: Conceal-
     ing the Deep Embedding of DSLs. In Proceedings of the 2014 Inter-
     national Conference on Generative Programming: Concepts and Ex-
     periences (GPCE 2014). ACM, New York, NY, USA, 73–82. https:
 [7] Thomas Kühn, Walter Cazzola, and Diego Mathias Olivares. 2015.
     Choosy and Picky: Configuration of Language Product Lines. In Pro-
     ceedings of the 19th International Conference on Software Product Line
     (SPLC ’15). ACM, New York, NY, USA, 71–80.
 [8] Geoffrey Mainland. 2007. Why It’s Nice to Be Quoted: Quasiquoting
     for Haskell. In Proceedings of the ACM SIGPLAN Workshop on Haskell
     Workshop (Haskell ’07). ACM, New York, NY, USA, 73–82. https://doi.
 [9] Jan Midtgaard, Norman Ramsey, and Bradford Larsen. 2013. Engineer-
     ing Definitional Interpreters. In Proceedings of the 15th Symposium on
     Principles and Practice of Declarative Programming (PPDP ’13). ACM,
     New York, NY, USA, 121–132.
[10] Tim Sheard and Simon Peyton Jones. 2002.               Template Meta-
     programming for Haskell. In Proceedings of the 2002 ACM SIGPLAN
     Workshop on Haskell (Haskell ’02). ACM, New York, NY, USA, 1–16.


Markedly - A cartographic approach for mapping eDSL implementation costs

Authors Jeanne-Marie Musca Karl Cronburg Matthew P. Ahrens

License CC-BY-4.0