Markedly - A cartographic approach for mapping eDSL implementation costs

Authors Jeanne-Marie Musca, Karl Cronburg, Matthew P. Ahrens,

Plaintext

Markedly
A cartographic approach for mapping eDSL implementation costs
Matthew P. Ahrens Karl Cronburg Jeanne-Marie Musca
Tufts University Tufts University Tufts University
Medford, Massachusetts, USA Medford, Massachusetts, USA Medford, Massachusetts, USA
mahrens@cs.tufts.edu karl@cs.tufts.edu jmusca@cs.tufts.edu

Abstract 1.2 The Problem
The cost of implementing an embedded domain specific lan- When a tool requires extraneous effort to implement an eDSL
guage (eDSL) depends on the eDSL development tools used feature, the implementation incurs an unexpected cost. Costs
to build it, but these costs are not readily apparent. Markedly are difficult to track across stages of language development
enables developers to map elements of the eDSL implementa- and make comparing the impact of implementation decisions
tion and attach costs to these elements. An implementation difficult with respect to the parts of the design those decisions
diverging from its design incurs four distinct types of costs: affect.
Preserve, Extend, Adhere, and Recognize.
To illustrate the Markedly approach, we calculate the costs 1.3 The Solution
of example eDSLs using a map for the Haskell language EDSL developers will better construct and modify their im-
metaprogramming and host language toolsets. To evaluate plementation by:
this approach, we will assess how developers reason about • Decomposing the design into eDSL features
costs while implementing an eDSL. • Associating features with a tool implementation
CCS Concepts • Software and its engineering → Gen- • Identifying connections between implementations
eral programming languages; • Social and professional • Track costs accumulated across connections
topics → History of programming languages;
2 Methodology
Keywords programming languages, functional languages,
Given a chosen deconstruction of an eDSL design into fea-
domain specific languages, semantics
tures, Markedly allows developers to calculate implementa-
ACM Reference Format: tion costs. To illustrate, we apply Markedly techniques to a
Matthew P. Ahrens, Karl Cronburg, and Jeanne-Marie Musca. 2017. toy eDSL for Identifiers with the features:
Markedly: A cartographic approach for mapping eDSL implementa-
• identifiers must match the regex [A-Z][a-z0-9]*
tion costs. Presented at Workshop on Meta-Programming Techniques
and Reflection (Meta’17). 5 pages. • identifiers must be unique in scope
For tracking costs during any stage of development – plan-
1 Introduction ning, constructing, or iterating – Markedly describes an in-
formal algorithm for visualizing an eDSL implementation as
1.1 The Practice
a map.
EDSL developers using Haskell’s language development toolset
choose between host language constructs and metaprogram- 2.1 Constructing the Map
ming tools when implementing their eDSLs. Host language Using Markedly, an eDSL developer constructs a map with
constructs provide the eDSL developer direct reuse of the regions representing language development tools. The devel-
core Haskell semantics and GHC’s extension constructs such oper marks points on a region when implementing part of
as generics, type families, and data kinds. Haskell’s metapro- the language in that tool. When planning, developers put one
gramming tools include the Template Haskell Q Monad, alge- point per feature on the map and connect points with edges
braic data type AST representation, and libraries for parsing to represent program dataflow. In practice, developers may
and compile time splicing[8, 10]. For simplicity, developers encounter instances where multiple tools may be needed for
prefer to implement their eDSL using only one tool, but some one feature, resulting in multiple points and edges. In itera-
eDSLs are better implemented by a composition of tools. tion, developers may add to or modify the implementation,
resulting in adding or moving points and edges.
This work is licensed under Creative Commons Attribution 4.0 International
License (CC BY 4.0).
For example, in Figures 1 and 2, the map partitions regions
Meta’17, October 22, 2017, Vancouver, Canada for the tools: Haskell semantics and Parsec parsing. The de-
© 2017 Copyright held by the owner/author(s). velopers for the Figure 1 implementation mark both features
with one point which represents implementing identifiers
1
Meta’17, October 22, 2017, Vancouver, Canada Ahrens, Cronburg, and Musca

as host language identifiers. In the Figure 2 implementation, Haskell
developers mark the regex feature in the parser tool region
as a parser implementation and the uniqueness feature in Haskell Ids
the Haskell semantics region as a Haskell set data structure.
To share program data between the features, developers add
an edge from the parser to the set data structure.
Figure 1. Haskell map for a host language embedding of
2.2 Metrics of cost Identifiers
To help developers track the impact of their implementation
decisions, Markedly accumulates costs along paths. To help Parsec Parser Haskell
developers focus on specific concerns of their implemen-
tation decisions, Markedly splits costs along four vectors, Parser Set
labeled PEAR.

2.2.1 Preserve
Figure 2. Haskell map for a metaprogram embedding of
The heuristic for the preservation cost of a map element
Identifiers
(points and edges) is How many features in the design are
compromised by this element? An implementation element
compromises a design feature by not enforcing a property Parsec Parser Haskell
of the design feature. In Figure 1, Haskell identifiers do not
enforce the leading capital property of the regex feature, Parser Data.Set
incurring a preservation cost of one.
Print Stx Err
2.2.2 Extend
Figure 3. Haskell map for a metaprogram embedding of
The heuristic for the extension cost of a map element is How
Identifiers with error reporting
many intermediate representations are not accessible from this
element? When the implementation of a map element does
not expose transformed program data, then that element or an edge. When multiple edges converge on a point, the
incurs an extension cost for each hidden representation. Sup- path takes the maximum cost for each metric at that point.
pose the representation of identifiers to check syntax was In the implementation of Identifiers in Figure 3 which adds
introduced and lost during error reporting in Figure 3. The error reporting, the accumulated cost along the parser edge
error reporting point would incur an extension cost of one. and syntax edge are (0,0,0,1) and (0,1,0,2), and set edge cost
is (0,0,0,0). Thus, (arдmax (0, 0, 1, 1)(0, 1, 0, 3)) + (0, 0, 0, 0))
2.2.3 Adhere calculates the path cost as (0,1,1,3).
The heuristic for the adherence cost of a map element is How Developers can now compare the costs of different imple-
many expressions or syntactic annotations must be written in mentations with proper separation of concerns. The Iden-
addition to simply implementing the feature? Complexity of tifiers eDSL implementations have costs of (1,0,0,0) for the
the function required to convert between tool representa- host language implementation, (0,0,0,1) for the metaprogram
tions informs this metric. For example, in Figure 3, the edge implementation, and (0,1,1,3) for the error handling imple-
from the parser to the error correcting point must convert mentation. Using these metrics, developers decide which
from the parser’s representation to a string thus incurring implementation to maintain depending on what will be dif-
an adherence cost of one against the regex feature. ficult to maintain. Connecting the terminal points of maps
allows developers to compose maps together and track costs
2.2.4 Recognize while iterating over their eDSLs.
The heuristic for the recognition cost of a map element is Is
this element present in the design? If the element is directly 3 Example eDSLs
tied to at least one feature, it has a cost of zero; otherwise it 3.1 Data Parsing
has a cost of one. In the example in Figure 3, error reporting
Consider an eDSL, µPADS [5], whose features are:
does not appear in the design, so the extra points and edges
added to support that feature incur a cost of one, each. • best-effort (longest parse) left-to-right parsing
• first-fit (packrat) parsing
2.3 Calculation and Comparison The implementation of (<|>) in Figure 4 takes two parser
A four-tuple of natural numbers represents the cost for a results and decides which one progresses. The map in Figure
map element : (P, E, A, R). A map element is either a point 5 defines the type signature, Nothing case, Both-Just case,
2
Markedly Meta’17, October 22, 2017, Vancouver, Canada

Figure 4. Representations of the two µPADS features. Fea- Element P E A R
ture (1) is implemented by lines 4 and 5, and feature (2) is Host 0 0 1 0
implemented by lines 4, 6, and 7. ADT 0 0 0 1
ValEnv 0 0 0 1
Haskell One-Just MkTyMkV al 0 1 0 1
pm 3 MkTyADT 0 0 1 1
Both-Just
TyConPath 0 0 2 2
pm 2
Nothing V alConPath 0 1 1 2
pm 1
Type Sig Figure 8. The non-zero PEAR costs of MunLang

• propagate failure if no result
Figure 5. Haskell map for µPADS • expect computation failure

3.2 Containers
TySig P E A R
Consider an eDSL MunLang that implements the multi-union
Nothing 0 0 0 1 data structure [4] The design of MunLang contains three
Both-Just 0 0 1 1 features:
One-Just 0 0 0 0 • Define a munion type with {: :} operators
• Construct a munion value with {: :} operators
pm1 0 0 0 1
• Splice types and values from host language
pm2 0 0 0 1 To define a munion type, specify a set of label-type pairs:
pm3 0 0 0 1 typeM 1 = {: τ1 l 1 , τ2 l 2 , τ3 l 3 :}. Similarly, to construct munion
values of type M 1 , declare a subset of label-value pairs,: V1 =
TotalPath 0 0 1 5 {: l 1 = v 1 , l 3 = v 2 :} :: M 1
In the map in figure 7, the points mun and host enforce
Figure 6. The non-zero PEAR costs of µPADS syntactic constraints for the munion and host language syn-
tax, respectively. MkTy and MkVal points construct munion
types and values, respectively. The ADT point represents
and One-Just case(s) as points and connects them via the Haskell type splicing and ValEnv represents Haskell value
standard Haskell tool of pattern matching (pm). splicing.
Line 4 in Figure 4 simply exists to adhere to the implied se- The developers use the heuristics for each PEAR dimen-
mantics of first-fit while implementing best-effort, incurring sion to calculate the PEAR costs in Figure . From these costs,
an adherence cost. The type representation, Nothing failure the developers document that the implementation “fills in”
case, and evaluation order imposed by pattern matching are the underspecified host language representation apparent in
not reflected by features thus incurring a recognition cost. the recognition cost, contains superfluous code to parse the
The design could reduce the PEAR cost to zero by being host language apparent in the adherence cost, and hides the
more specific: type environment apparent in the extension cost.
• best-effort (longest parse) left-to-right parsing pre-
ferred 3.3 Type-Level Naturals
• first-fit (packrat) parsing if results equivalent or only Consider an eDSL, Numeric Type Annotations whose features
one result are:
3
Meta’17, October 22, 2017, Vancouver, Canada Ahrens, Cronburg, and Musca

data Nat = Z | S Nat developers may add implementation details which make
type Weighted (w :: Nat) a = a use of algebraic numerals, risking adherence costs if adding
type family (w1 :: Nat) :+: (w2 :: Nat) :: Nat complicated parsing and printing expressions.
type instance Z :+: m = m
type instance (S n) :+: m = S (n :+: m) 4 Evaluation
($$) :: Weighted w1 (a -> b) -> Weighted w2 a To evaluate our cartographic approach, developers will im-
-> Weighted (w1 :+: w2) b plement an eDSL in Haskell as part of a user study.
($$) f a = f a
4.1 Given
Figure 9. The deeply embedded implementation for Type Support for an overspecified eDSL feature evaluates preser-
Level Naturals vation. The amount of code modified when adding a fea-
ture after initial implementation evaluates extension. The
Haskell number of intermediate representations evaluates adherence.
Natural ADT $$ Abstracting the design from the implementation evaluates
recognition.
Weighted Type
4.2 Experiment
Half of participating developers will use the Haskell map
that contains the maps from the eDSL examples and the costs
TyVar :+:
of known paths between regions. Each developer will place
Data Kinds Type Family eDSL features on the map and calculate the accumulated
PEAR costs while iterating over their implementation.
Figure 10. The Haskell map for Type Level Naturals
5 Current Work and Conclusion
Element P E A R To bridge the gap between developer knowledge and auto-
matic tool support, an artifact of Markedly would provide
Natural ADT 1 0 0 0 an interactive view where:
Weighted Type Alias 0 0 1 0
• A specification language encodes the design
TyVar 0 0 0 1 • PEAR costs are automatically calculated
:+: 0 0 1 0 • A library provides common eDSL map elements
N atTyV ar 0 0 0 1 • Code templates are generated from a map instance
TyV arW eiдht 0 0 0 1 Language design workbenches associated with edges on
a Markedly map by proving a unified representation for
TyV ar : + : 0 0 0 1
constructing an eDSL. Developers can apply Markedly to
ProдPath 1 0 2 3 language design workbench eDSL implementations by rep-
resenting the workbench as a large region on the map and
focusing on the features that require tools outside the work-
Figure 11. The non-zero PEAR costs of Type Level Naturals bench as points and edges that leave the region.
The realities of implementing an eDSL incur various kinds
• readable natural number weight types of costs dependent on the stage of development. In the Markedly
• attach weights to arbitrary expressions approach, we categorize these costs into the four PEAR
• summing weights across function application metrics. Markedly describes costs for eDSL implementation
A deep embedding represents the eDSL using the Haskell tools and our Haskell eDSL examples provide insight into
type system in Figure 9. The map in Figure 10 for this eDSL instances of the approach. A user study would allow us to
contains the regions of data kinds, type families and Haskell determine if the approach and Haskell specific artifacts facil-
semantics. itate reasoning about implementation choices with respect
In Figure 11, the implementation manifests a connection to PEAR costs.
between numeric representation and weights incurring a
recognition cost. Because the implementation must thread Acknowledgments
the type variable, w :: N at, elements adding extra type vari- Markedly is sponsored by the Defense Advanced Research
able annotations in addition to their features incur an ad- Projects Agency (contract: FA8750-15-2-0033). Markedly does
herence cost. The preservation cost occurs from the natural not necessarily reflect the position or the policy of the US
numbers Peano representation being difficult to read. The Government, and no official endorsement should be inferred.
4
Markedly Meta’17, October 22, 2017, Vancouver, Canada

References
[1] Walter Cazzola and Albert Shaqiri. 2016. Modularity and Optimiza-
tion in Synergy. In Proceedings of the 15th International Conference on
Modularity (MODULARITY 2016). ACM, New York, NY, USA, 70–81.
https://doi.org/10.1145/2889443.2889445
[2] Walter Cazzola and Ivan Speziale. 2009. Sectional Domain Specific
Languages. In Proceedings of the 4th Workshop on Domain-specific
Aspect Languages (DSAL ’09). ACM, New York, NY, USA, 11–14. https:
//doi.org/10.1145/1509307.1509311
[3] Martin Churchill, Peter D. Mosses, and Paolo Torrini. 2014. Reusable
Components of Semantic Specifications. In Proceedings of the 13th
International Conference on Modularity (MODULARITY ’14). ACM, New
York, NY, USA, 145–156. https://doi.org/10.1145/2577080.2577099
[4] Corinna Cortes, Kathleen Fisher, Daryl Pregibon, Anne Rogers, and
Frederick Smith. 2004. Hancock: A Language for Analyzing Transac-
tional Data Streams. ACM Trans. Program. Lang. Syst. 26, 2 (March
2004), 301–338. https://doi.org/10.1145/973097.973100
[5] Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-specific
Language for Processing Ad Hoc Data. In Proceedings of the 2005
ACM SIGPLAN Conference on Programming Language Design and Im-
plementation (PLDI ’05). ACM, New York, NY, USA, 295–304. https:
//doi.org/10.1145/1065010.1065046
[6] Vojin Jovanovic, Amir Shaikhha, Sandro Stucki, Vladimir Nikolaev,
Christoph Koch, and Martin Odersky. 2014. Yin-yang: Conceal-
ing the Deep Embedding of DSLs. In Proceedings of the 2014 Inter-
national Conference on Generative Programming: Concepts and Ex-
periences (GPCE 2014). ACM, New York, NY, USA, 73–82. https:
//doi.org/10.1145/2658761.2658771
[7] Thomas Kühn, Walter Cazzola, and Diego Mathias Olivares. 2015.
Choosy and Picky: Configuration of Language Product Lines. In Pro-
ceedings of the 19th International Conference on Software Product Line
(SPLC ’15). ACM, New York, NY, USA, 71–80. https://doi.org/10.1145/
2791060.2791092
[8] Geoffrey Mainland. 2007. Why It’s Nice to Be Quoted: Quasiquoting
for Haskell. In Proceedings of the ACM SIGPLAN Workshop on Haskell
Workshop (Haskell ’07). ACM, New York, NY, USA, 73–82. https://doi.
org/10.1145/1291201.1291211
[9] Jan Midtgaard, Norman Ramsey, and Bradford Larsen. 2013. Engineer-
ing Definitional Interpreters. In Proceedings of the 15th Symposium on
Principles and Practice of Declarative Programming (PPDP ’13). ACM,
New York, NY, USA, 121–132. https://doi.org/10.1145/2505879.2505894
[10] Tim Sheard and Simon Peyton Jones. 2002. Template Meta-
programming for Haskell. In Proceedings of the 2002 ACM SIGPLAN
Workshop on Haskell (Haskell ’02). ACM, New York, NY, USA, 1–16.
https://doi.org/10.1145/581690.581691