Functional Baby Talk: Analysis of Code Fragments from Novice Haskell Programmers

Authors Blair Archibald, Jeremy Singer,

Plaintext

Functional Baby Talk: Analysis of Code Fragments from
Novice Haskell Programmers

Jeremy Singer and Blair Archibald
School of Computing Science
University of Glasgow
UK
jeremy.singer@glasgow.ac.uk b.archibald.1@research.gla.ac.uk

What kinds of mistakes are made by novice Haskell programmers, as they learn about functional
programming? Is it possible to analyse these errors in order to improve the pedagogy of Haskell?
In 2016, we delivered a massive open online course which had an interactive code evaluation envi-
ronment. We captured and analysed over 160K interactions from learners. We report typical novice
developer behaviour; for instance, the mean time spent on an interactive tutorial is around eight min-
utes. Although our environment was restricted, we gain some understanding of novice learner errors.
Parenthesis mismatches, lexical scoping errors and do block misunderstandings are common. Fi-
nally, we make recommendations about how such beginner code evaluation environments might be
enhanced.

1 Introduction
The Haskell programming language [11] has acquired a reputation for being difficult to learn. In his
presentation on the origins of Haskell [17] Peyton Jones notes that, according to various programming
language popularity metrics, Haskell is much more frequently discussed than it is used for software
implementation. The xkcd comic series features a sarcastic strip on Haskell’s side-effect free property
[15]. Haskell code is free from side-effects ‘because no-one will ever run it.’
In 2016, we ran a massive open online course (MOOC) at Glasgow, providing an introduction to
functional programming in Haskell. We received many items of learner feedback that indicated the
difficulty with learning Haskell. Some people found the tools problematic: “I have been trying almost
all today to get my first tiny Haskell program to run. The error messages on ghci are very difficult to
understand.” Others struggled conceptually with the language itself: “[It] is tedious to do absolutely
anything in [Haskell] and it and made me hate Haskell with a passion.” Some MOOC participants have
tried to learn Haskell several times: “It’s not my first attempt to learn Haskell. I always get stuck with
monad part and do-notation.”
We want to discover how and why novice functional programmers struggle to learn Haskell. What
are the common mistakes they make? Once we know the key issues, we can start to address these in
various ways.
1. Improved pedagogy: More focused textbooks, exercises and online help will provide better
learner support to aid learners to avoid these mistakes. This is why Allen and Moronuki [1] wrote
a new textbook based on feedback from years of tutoring Haskell novices. They claim that “the ex-
isting Haskell learning materials were inadequate to the needs of beginners. This book developed
out of our conversations and our commitment to sharing the language.”

c J. Singer & B. Archibald
Submitted to:
This work is licensed under the
TFPIE 2017
Creative Commons Attribution License.
2 Functional Baby Talk

Course title Platform Lead Educators Run (year) Signups Completions
Functional Programming Coursera Odersky 1 (2012) 50K [14] 9.6K
Principles in Scala
Introduction to Functional edX Meijer 1 (2014) 38K [12] 2.0K
Programming
Introduction to Functional FUN Di Cosmo / Regis- 1 (2015) 3.7K 300
Programming in OCaml Gianas / Treinen
Functional Programming FutureLearn Singer / Vander- 1 (2016) 6.4K 900
in Haskell bauwhede
Functional Programming FutureLearn Thompson 1 (2017) 5.6K 400
in Erlang

Table 1: Summary of functional programming MOOCs, including statistics where available or confirmed
personally (note that completion metrics are not directly comparable across MOOC providers)

2. Error-Aware toolchain: The standard Haskell tools often feature impenetrable error messages
[7]. If we are aware of particular difficulties, then we can provide dedicated support, or cus-
tomized error messages, to handle these problems. The Helium system [10] is motivated by such
considerations.
3. Modified language: It may be sensible to lobby the Haskell language committee to remove inci-
dental complexity that causes problems in the first place. Stefik [18] describes how the Quorum
programming language was designed and refined based on empirical user studies.
But how do we identify the common mistakes that novices encounter? Generally, we rely on first-
hand anecdotal evidence. As educators, we think back to how we, as individuals, learnt functional
programming. However a sample size of one leads to the fallacy of hasty generalization. If we teach a
Haskell course, then we might consider our cohorts of students. The class sizes are relatively small—
perhaps tens or at most hundreds of students. Of course, this population accumulates slowly over time.
However there is implicit bias in this population, since the students are only exposed to our teaching.
We now live in the era of massive open online courses (MOOCs). MOOC platforms have extensive
support for data capture and learner analytics. A MOOC can reach a massive class, scaling to thousands
of students. Table 1 shows the numbers of learners who signed up for various functional programming
MOOCs. Public sources are given where available, and the other statistics were obtained from personal
emails to the lead educators.
We designed the learning materials in our MOOC so as to capture learners’ interactions via an on-
line system, particularly their interactive programming exercises. Our aim is to analyse the program
fragments, to discover what learners find difficult, and whether we might be able to address any of their
issues using the strategies outlined above.

2 Evaluation Platform
Our students use a browser-based read-eval-print-loop (REPL) system for the first three weeks of our six
week course. This system is based on the tryhaskell framework [5] which has an interactive Javascript
front-end to parse expressions and provide user feedback, coupled with a server-based back-end that uses
the mueval tool [3] to evaluate simple Haskell expressions in a sandboxed environment. Figure 1 shows
J. Singer & B. Archibald 3

Figure 1: Architectural diagram of our Haskell code evaluation infrastructure, which was hosted on
Amazon Web Services (AWS)

the architecture of our system. Note that we hosted three load-balanced mueval servers on Amazon EC2
instances for the duration of our course.
At the client-side, a learner follows a series of instructions in an interactive prompt-based Javascript
terminal, entering one-line Haskell expressions. See Figure 2 for an example screenshot. These expres-
sions are sent to the server, where they are executed by the mueval interpreter in a stateless way. The
result is returned to the client and displayed in the REPL. The mueval interpreter logs each one-line
expression it attempts to evaluate, along with a time stamp and an originating IP address.
We make several minor modifications to the original tryHaskell system, which we have forked on
github [20].

1. We use Javascript to wrap let expressions in a context so we can emulate persistent name bind-
ings. This compound let expression is automatically prefixed to each one-liner that the user
enters, before the code is sent to mueval.

2. Whereas the original tryhaskell system has a fixed sequence of interactions, we allow a flexible
scripted set of interactions that highlight the particular topics we are covering at each stage of the
MOOC.

3. We run multiple evaluation servers concurrently, behind a cloud-based load-balancer. A client can
be served by any evaluation server, since all interactions are stateless.
4 Functional Baby Talk

Figure 2: Screenshot of our interactive Haskell tutorial front-end, based on the tryhaskell system

3 Learner Data Analysis
In this section, we analyse the data from our MOOC learner population. First we summarize the log files
gathered from the three load-balanced servers. The data was collected from 21 September to 3 Novem-
ber 2016, which included the majority of the course. The log files record 161K lines of interactions,
comprising around 17MB of data. We logged 3.3K unique IP addresses, which is almost exactly equal to
the number of ‘active learners’, i.e. those who completed at least one step of the course. Figure 3 shows
aggregated counts of these geo-located IP addresses, superimposed on a world map.
The following sections drill down into more detailed analysis of this data.

3.1 Interactive Sessions
Learners use our interactive tutorial platform in a session-based manner. Each tutorial exercise is de-
signed to take between 5 and 15 minutes, if the user reads the prompts carefully and constructs proper
Haskell expressions for evaluation. There are seven exercises in the first three weeks of the course.
From our log files, we attempt to reconstruct these sessions. Hage and van Keeken [9] refer to such
sessions as coherent loggings. We group together log entries that originate from the same IP address,
with an interval of less than 10 minutes between successive interactions. The tutorial system is set up
as a read/eval/print loop. As a dual to this, we model the user as a write/submit/think loop. We set a 10
minute limit on iterations round this loop.
We acknowledge that IP address might not be an entirely accurate identifier for a learner, but it is the
best proxy we can extract cheaply from our log files. Learners did not have to authenticate to access the
interactive tutorials.
In total, we identified 5.7K sessions from our logs. The mean number of sessions per IP address is
1.72. This data shows us that many learners did not complete the full set of seven tutorials. Figure 4
J. Singer & B. Archibald 5

Figure 3: Map of geo-located IP addresses that accessed our servers during the course. Multiple accesses
from the same IP address are only counted once in this map. Some IP addresses failed geo-location.

shows the number of sessions per IP address, split into 30 equal-sized bins.
The mean time spent in a single session is 490s, or around 8 minutes. This corresponds to our design
intentions, and also closely follows the observations of Guo et al. on MOOC video length to maximise
engagement [8]. Figure 5 shows the length of time per session, split into 30 equal-sized bins.
The mean number of Haskell expressions entered per session is 17. Figure 6 shows the number of
lines per session, split into 30 equal-sized bins.
The steep drop-off apparent in these histograms is characteristic of a fat-tailed distribution seen in
general MOOC learner engagement, as outlined by Clow with his theory of the funnel of participation
[4].

3.2 Adventurous Coders
The interactive coding materials, which we host on our forked variant of tryhaskell, are walk-through
tutorial style exercises. Each tutorial consists of a fixed number of discrete steps. Each step includes
some explanatory text, then the user is prompted to enter a Haskell expression. Sometimes we provide
the correct expression directly, and the user simply copies this. Other times we provide hints, and users
have to construct their own expressions. Note that we are fairly flexible, in terms of the user entering
different code — if the expression is at all appropriate then the tutorial advances to the next step.
We measure the proportion of user input that is based directly on cutting and pasting Haskell code
supplied in the tutorial. The rest of the input is modified by users, who have either edited the code to
customize it in some way or written something completely different.
In terms of the 160K lines of user input, around 100K are copied lines of code and 60K are original
lines. We also analyse each session individually, to measure the proportion of lines in each session
that are original. Note that sessions have different lengths, as Section 3.1 explains. Figure 7 presents a
6 Functional Baby Talk

Figure 4: Histogram of sessions split into bins based on number of sessions per IP address

Figure 5: Histogram of sessions split into bins based on the total time of each session
J. Singer & B. Archibald 7

Figure 6: Histogram of sessions split into bins based on the number of lines of code in each session

histogram of sessions, in terms of the proportion of original lines of code per session. The sessions are
split into five bins. We observe that the largest bin contains sessions with minimal code modifications.

3.3 Learner Errors
We extracted the individual Haskell expressions entered by learners, from the log file entries. We ran each
of these expressions through a Haskell expression parser, based on the haskell-src-meta package. We
discovered that 8.1K of the 161K lines could not be parsed correctly as Haskell expressions.
Table 2 lists the most common error messages reported by our parser. There are some clear high-level
problems, such as:
1. Parenthesis mismatch: Closing parenthesis characters are frequently missing. A sample expression
that causes this error is: min((max 3 4) 5)) — Heeren et al. [10] also state that “illegal nesting
of parentheses, braces, and brackets is a very common lexical error”.
2. Bad scoping: Problems with let and where constructs are apparent. The mueval expression parser
does not support where clauses properly. Some users had confusion with the syntax of let, e.g. let
(a,b) (10, 12) in a * 2.
3. Misunderstanding do blocks: Many people tried to bind values to names with <- outside a do
block, or bind names in a do block as the final action, e.g. do putStrLn "nnnnn "; z <-
getLine;
4. Complex constructs: The mueval interpreter is particularly restrictive. It does not support data
or type definitions, or definitions of multi-line functions. Several users attempted to enter such
code, which did not parse correctly. We encouraged users to move onto ghci for these constructs.
Perhaps some did not read the supporting text, or were expert Haskell users trying to discover the
limits of our system.
8 Functional Baby Talk

Figure 7: Histogram of sessions split into bins based on proportion of original lines of code in session

5. Incorrect syntax for enumFromThenTo syntactic sugar: People misunderstood the .. notation,
generating incorrect list expressions like: [0,1,3..10] or [0, 2, ..] or [1.1, 1.2, ..
2.0] – this may have been a problem with our tutorial material.

As course designers, we opted to use tryhaskell because it works ‘out of the box’ for learners in their
familiar browser environments; there is no need to do any confusing or difficult toolchain installation
before starting to write code. However we recognise that the tryhaskell environment, with its single
line REPL interface and lack of syntax highlighting, is not suitable for learning anything more than the
most basic Haskell expressions. We will need to provide more explicit signposting here, or a more fully
featured online evaluation environment.

4 Threats to Validity
This section considers potential threats to the validity of our findings, in terms of characterizing novice
Haskell programmers.

4.1 Lack of Generality
The log files we analysed were generated by learners from the first run of a single MOOC. While the
learners were drawn from varied backgrounds, they were exposed to a single set of Haskell educational
resources on this course. In that sense, the findings may not be representative of a wider variety of
novices who are taught in a different way, or with different materials.
We acknowledge the need to gather data from repeated runs of the course, and to cross-check this
data with other courses. For instance, the Helium project [10] reports similar data, but is at a higher level
of abstraction.
J. Singer & B. Archibald 9

Reported error : Count
-----------------------
Parse error: } : 227
Parse error: .. : 223
Parse error: | : 196
Parse error: ) : 174
Parse error: , : 136
Parse error: <- : 135
Parse error: \\ : 130
Parse error: -> : 114
Parse error: in : 105
Parse error: ; : 98
Parse error: where : 91
Parse error: ] : 85
Parse error: + : 64
Parse error: Last statement in a do-block must be an expression : 63
Parse error: else : 61
Parse error: / : 50
Parse error in pattern: length’ : 45
Parse error: virtual } : 41
Parse error: let : 36
Parse error: if : 34
Parse error: ‘ : 33
Parse error: { : 30
Parse error: * : 28
Parse error: then : 27
Parse error: > : 27
Parse error in pattern: l’ : 26
Parse error in pattern: l : 26
Parse error: type : 25
Parse error in pattern: f : 25
Parse error in pattern: : 23
Parse error: data : 22
Parse error: => : 21
Parse error: - : 20

Table 2: Sorted list of most frequent parse errors for learner input
10 Functional Baby Talk

4.2 Supported Language Subset
The tryhaskell REPL environment, backed by the mueval utility, handles simple one-line Haskell expres-
sion evaluation. In that sense, only a subset of the Haskell language is supported. Hence we can are
limited to exposing a subset of novice errors.
We argue that the language features supported by mueval are most appropriate for novice Haskell
learners — hence the errors are still representative of those that would be experienced in the full Haskell
language. We could confirm this by replacing mueval with a more powerful evaluation framework.

4.3 Software Incompatibility
Some learners reported that our interactive tutorials did not work on their machines. This may have been
due to web browser incompatibilities or OS issues. The problem seemed to occur mostly on Windows
devices (but this was easily the most popular OS).
Because of this, our logs might be skewed towards users with particular browser/OS configurations,
but we do not expect that this will introduce significant bias into the results.

5 Related Work
In addition to the original tryhaskell platform [5] there are similar online REPL systems for Scala (Scastie
and Scala fiddle), OCaml (tryOCaml) and other functional languages. While these systems might capture
user interactions, to the best of our knowledge there is no publicly available analysis of the data.
The Helium system [10] logs the behaviour of novice Haskell developers. Hage and van Keeken
[9] presents some similar metrics and graphs to ours, based on mining data from Helium logs. Their
platform performs whole-program execution, so they can work with a richer set of Haskell constructs.
They provide a list of common Haskell errors [10]. Some of their lexical errors are identical to ours, such
as parenthesis problems. They also have details of type errors, which we did not analyse in our logs.
Thompson [19] presents a list of common Haskell errors that are generated in the hugs interpreter.
Again, there is some overlap with our set of errors.
Bental [2] describes an instance of the interactive Ceilidh framework for assignment-based ML pro-
gram submission and feedback. She gives a categorization of 134 programming questions submitted by
students who could not generate correct ML code to solve particular assignments. 18 of these problems
are syntactic — which are most strongly related to the errors we have highlighted in our study.
Marceau et al. [13] present the DrRacket system, and how they devised a set of user-friendly error
messages which were the outcome of systematic user trials. They observe that parenthesis matching
is a common problem in the first lab exercise, but becomes less apparent in later labs as students gain
experience.
DrRacket [6] introduces a LISP-like language to novice programmers via a controlled sequence of
gradually more powerful language subsets. It might be possible for us to do something similar with
Haskell. We would need to make it clear which language elements are supported in each subset. In
effect, the tryhaskell system does define a Haskell subset, since there are many features (e.g. multi-line
functions and datatype definitions) that it does not support. It might be sensible to trigger “wait till later”
messages if a learner tries to evaluate anything more complex than we expect, inside our REPL.
Murphy et al. [16] note that a small number of UK universities (four) teach Haskell as a first pro-
gramming language to Computer Science students. This study describes general motivations for selecting
J. Singer & B. Archibald 11

particular languages, as well as perceptions of their relative difficulty. However the data does not give
particular insights regarding Haskell, since it is such a small proportion of the overall sample.

6 Conclusions
Interactive learning environments are useful for novice developers, but engagement is subject to the
standard drop-off that characterizes MOOC participation. We analysed 161K lines of Haskell code sub-
mitted for evaluation by learners, and identified some common syntactic error patterns. Many of these
are consistent with previous error classifications reported in the literature.
We argue that a richer online environment (featuring syntax highlighting, parenthesis matching, and
multi-line input, inter alia) would be more appropriate to support beginner Haskell developers.
Richer analysis may be possible from our log files. We intend to publish anonymized versions of
our logs, for the benefit of the research community. We will continue to record and analyse student
interactions in future iterations of our Haskell MOOC.

References
[1] Christopher Allen & Julie Moronuki (2017): Haskell Programming from First Principles. Gumroad (ebook).
[2] Diana Bental (1995): Why doesnt my program work? requirements for automated analysis of novices com-
puter programs. In: Proc. Workshop on Automated Understanding of (Novice) Programs, World Conference
on AI and Education, Washington DC, USA.
[3] Gwern Branwen (2014): Mueval. https://github.com/gwern/mueval.
[4] Doug Clow (2013): MOOCs and the Funnel of Participation. In: Proceedings of the Third International
Conference on Learning Analytics and Knowledge, pp. 185–189, doi:10.1145/2460296.2460332.
[5] Christopher Done (2014): Try Haskell. https://github.com/tryhaskell/tryhaskell.
[6] Robert Bruce Findler, John Clements, Cormac Flanagan, Matthew Flatt, Shriram Krishnamurthi, Paul Steck-
ler & Matthias Felleisen (2002): DrScheme: A programming environment for Scheme. Journal of functional
programming 12(02), pp. 159–182.
[7] GHC Wiki (2015): Custom Type Errors. https://ghc.haskell.org/trac/ghc/wiki/Proposal/
CustomTypeErrors.
[8] Philip J. Guo, Juho Kim & Rob Rubin (2014): How Video Production Affects Student Engagement: An
Empirical Study of MOOC Videos. In: Proceedings of the First ACM Conference on Learning @ Scale
Conference, pp. 41–50, doi:10.1145/2556325.2566239.
[9] Jurriaan Hage & Peter van Keeken (2007): Mining Helium programs with Neon. In: Draft Proceedings of the
8th Symposium on Trends in Functional Programming, pp. 35–50. Available at http://www.academia.
edu/download/30841452/10.1.1.81.4266.pdf.
[10] Bastiaan Heeren, Daan Leijen & Arjan van IJzendoorn (2003): Helium, for Learning Haskell. In: Proceed-
ings of the 2003 ACM SIGPLAN Workshop on Haskell, pp. 62–71, doi:10.1145/871895.871902. Available
at http://doi.acm.org/10.1145/871895.871902.
[11] Paul Hudak, Simon Peyton Jones, Philip Wadler, Brian Boutel, Jon Fairbairn, Joseph Fasel, Marı́a M
Guzmán, Kevin Hammond, John Hughes, Thomas Johnsson et al. (1992): Report on the programming lan-
guage Haskell: a non-strict, purely functional language version 1.2. ACM SigPlan notices 27(5), pp. 1–164.
[12] Katy Jordan (2015): MOOC Completion Rates: The Data. http://www.katyjordan.com/MOOCproject.
html.
12 Functional Baby Talk

[13] Guillaume Marceau, Kathi Fisler & Shriram Krishnamurthi (2011): Measuring the Effectiveness of Error
Messages Designed for Novice Programmers. In: Proceedings of the 42nd ACM Technical Symposium on
Computer Science Education, pp. 499–504, doi:10.1145/1953163.1953308.
[14] Heather Miller & Martin Odersky (2012): Functional Programming Principles
in Scala: Impressions and Statistics. http://docs.scala-lang.org/news/
functional-programming-principles-in-scala-impressions-and-statistics.html.
[15] Randall Munroe (2014): Haskell. https://xkcd.com/1312/.
[16] Ellen Murphy, Tom Crick & James H. Davenport (2017): An Analysis of Introductory Program-
ming Courses at UK Universities. The Art, Science, and Engineering of Programming 1(2), p. 18,
doi:https://doi.org/10.22152/programming-journal.org/2017/1/18.
[17] Simon Peyton Jones (2017): Escape from the Ivory Tower: the Haskell Journey. Video–watch from 15:30,
https://www.youtube.com/watch?v=re96UgMk6GQ.
[18] Andreas Stefik & Susanna Siebert (2013): An Empirical Investigation into Programming Language Syntax.
Trans. Comput. Educ. 13(4), pp. 19:1–19:40, doi:10.1145/2534973. Available at http://doi.acm.org/
10.1145/2534973.
[19] Simon Thompson (1999): Some Common (and not so common!) Hugs Errors. https://www.cs.kent.
ac.uk/people/staff/sjt/craft2e/errors/allErrors.html.
[20] Wim Vanderbauwhede & Jeremy Singer (2016): Haskell Tutorials. https://github.com/
wimvanderbauwhede/haskelltutorials.