Automated Code Tracing Exercises for CS1

Authors Seán Russell,

Plaintext

Automated Code Tracing Exercises for CS1
Seán Russell
University College Dublin
Dublin, Ireland
sean.russell@ucd.ie
ABSTRACT of the required skills. But from the perspective of some students
The ability for students to read and comprehend code is a fundamen- the same assignment will be perceived as a requirement to produce
tal skill in computer programming. Relying on students to build this code that gives the “right” answer. Students may also completely
skill through typical programming assignments can lead to many bypass the learning opportunity provided because these types of
persevering through trial and error rather than understanding. This assessment are particularly vulnerable to plagiarism [10].
paper describes Trace Generator, a work-in-progress application Research has shown that some students will resort to the trial
for generating automatically graded code tracing questions for and error approach when faced with completing these types of
Python and C programs. The fundamental principles behind this programming tasks [3]. Trial and error is unlikely to be successful
work are mastery through repetition and providing comprehensive in the long term, but can be quite effective for solving simpler CS1
and understandable feedback to enable students to learn from their and CS2 assignments. When the student reaches more advanced
mistakes. Feedback and reflections from the use of the generated courses the use of such an approach could have negatively impacted
questions with two introductory procedural programming classes the development of many of the key skills used in programming.
(200 students) are also discussed. Analysis of student attempts sug- Assessing students on their code comprehension skills directly
gests a willingness to complete quizzes multiple times until they both provides an incentive for students to develop a key skill (in-
achieved a satisfactory score (average final result of 91%). creasing their grades) and emphasises the importance that we place
on their abilities in this regard. Moreover, these assessments should
CCS CONCEPTS provide an opportunity for the students to learn [8, 14].
In aiming to design code comprehension assessments, two fea-
• Social and professional topics → Computing education; CS1;
tures were considered essential. First, feedback should be immediate
• Applied computing → Learning management systems; Computer-
and comprehensive. Second, Students should have the opportunity
assisted instruction.
to apply what they have learned from the feedback in repeated
KEYWORDS attempts (with different questions).
The nature of these features requires the construction of a ques-
code tracing, code comprehension, automatic grading, generated tion bank of sufficient size that students can repeat the assessments
feedback without encountering the same question twice. Manually building
ACM Reference Format: these questions would be prohibitively time consuming, as such an
Seán Russell. 2022. Automated Code Tracing Exercises for CS1. In Computing approach that generated questions was used.
Education Practice 2022 (CEP 2022), January 6, 2022, Durham, United Kingdom. The remaining sections of the paper detail related work, the
ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3498343.3498347 design principles for questions, and preliminary evaluation.
1 INTRODUCTION
The principal aim of an introductory programming course is typi-
2 RELATED WORK
cally to teach students to program. While educators will be aware The report of the McCracken group [10], highlighted the impor-
that programming involves the combination of many interdepen- tance of program comprehension skills in program writing and
dent skills, this may not be as evident to students. prompted more investigation into the factors effecting performance
When learning their first programming language, students must in programming tasks. Lister et al. [9] investigated how students
learn the sometimes unforgiving syntax of a new language, data approach code tracing problems and categorised the “doodles” that
types and their operations, the effects different statements have the students produced in this process. These categories of doo-
on variables and control flow. In parallel, students must develop dle, such as Synchronized Trace, Trace, Number and Computation,
strategies and techniques to solve problems using the language. were identified in student responses and the percentage of cor-
From an instructor perspective, a large well designed program- rect answers for each category were calculated. The more effective
ming assessment will encourage the development and use of all strategies that students use to solve program comprehension tasks
were incorporated into the Trace Generator.
There has been research into the benefits of code tracing tutors
on students abilities in program writing tasks [1, 7, 13]. While some
This work is licensed under a Creative Commons Attribution International
4.0 License. studies have reported no benefit [1], others have found some or
significant improvement in code writing abilities [7, 13].
CEP 2022, January 6, 2022, Durham, United Kingdom A large number of tools and tutors have been developed to help
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9561-8/22/01. students learn code comprehension skills. Some tools provide an-
https://doi.org/10.1145/3498343.3498347 imated visualisation of the execution of programs [11, 13] with

13
CEP 2022, January 6, 2022, Durham, United Kingdom Seán Russell

internal state of the program displayed as the execution progresses
under the control of the user. Some tools that provide this function-
ality are designed as an integrated development environment (IDE)
which provides visualisation of program execution or a plugin that
provides the same functionality to an existing IDE [2, 13]. More re-
cently, tools of this nature are implemented for use on the Internet
through a web browser [5, 15].
Beyond visualisation, many programs are designed as tutors to
guide student learning through different aspects of program compre-
hension [4, 6, 12]. Helminen and Malmi [4] used the visualisations
provided as a feedback mechanism to help students understand
the automatically graded programming exercises they completed
using the tool. Other tutors test students on other aspects using
automatically generated questions [6, 12].
The Trace Generator is designed to reproduce the key features
of these tutors (visualisations for feedback and testing on program
comprehension) without requiring students to use an external
tool/tutor. The generate questions can be imported into a ques- Figure 1: Questions displayed tracing the code b = a * 2
tion bank in Moodle without requiring any additional plugins or
libraries. A key motivation was the simplicity of use by the students
in the classes, where all other materials and assessments are hosted evaluates all of the expression in the code to choose the correct
within the Moodle page of the course. That motivation was also answer. This gives the student a structured way to practice the
heavily reinforced by the transition to online learning in the wake Synchronized Trace doodle described in [9].
of the COVID-19 pandemic. Further formats are currently under development focused on
two areas, (1) variable representation in the stack and (2) point-
3 CODE TRACING QUESTIONS ers/references and memory. A student that successfully completes
code tracing exercises in all of the above formats would be required
3.1 Design of Questions to have a good understanding of the underlying principles by which
The Trace Generator is intended to produce questions in several their code is executed.
formats. This is to complement the deepening knowledge and un-
derstanding of the students as they progress through a CS1 class. 3.2 Answering Questions
The first format used in a course would be the single-line ques-
tion. Students are required to trace the execution and consequences The questions generated by the tool require many responses from
of a single line of code from a short program. Answering these the students. In order to make the process of completing the ques-
questions correctly requires an understanding of the evaluation of tions less time consuming (and repeating assessments more palat-
expressions, focusing on the order that calculations are performed able), many of the questions are multiple choice (dropdown boxes).
in, the result of individual calculations, the data type of that result Explanation, code executed, data type and line number questions
and which line of code will be executed next. are all multiple choice, while the result of expression and value
Where the code contains an assignment or a function with side- of variable questions must be typed by the student. The multiple
effects (like scanf), the resulting changes to variables in memory choice questions contain distractors generated based on the pro-
must also be entered. This gives the student a structured way to gram code being traced in the question.
practice the Computation and Number doodles described in [9]. An
example of this type of question can be seen in Figure 1. 3.3 Feedback
The order in which operations are executed is based first on In order to more easily establish the code tracing questions as
operator precedence, and then from left to right. While this may learning events, sufficient feedback must be provided [14]. While
not match the actual order of execution when optimisations are no automated system is likely to provide feedback as directed and
applied, it is consistent with the order that would be used if the individual as an instructor or teaching assistant, thorough feedback
variables or code literals were replaced by functions. Addresses is generated for each question. Each individual part of the question
used in these questions are abstract identifiers and are included for is highlighted as correct or incorrect and the correct answer is
use with the scanf function. shown. In addition to this, general feedback is provided in the form
The second format used in a course would be the multi-line ques- of an animation accompanied by explanatory text.
tion. Students are required to trace the execution and consequences Figure 2 shows a frame from the animated feedback of a single
of a sequence of statements or a short program. Answering these line question (the same question that is shown in Figure 1). A
questions correctly requires an understanding of the control flow representation of the steps in executing the statement b = a * 2 is
in the code and the changing values of variables as execution pro- shown. During the animation, each part highlighted in the order
gresses. These questions do not require the detailed step-by-step that they are evaluated and the results of each expression moving
break down of each line of code, but still require that the student to the nodes in the tree where they are used. The animation is

14
Automated Code Tracing Exercises for CS1 CEP 2022, January 6, 2022, Durham, United Kingdom

10.0

9.5

9.0

Average Result
8.5

8.0 quiz
Q1
Q2
Figure 2: Feedback for a single-line question variant Q3
type
Final
7.5 Intermediary
1 2 3 4 5 6 7 8 9
accompanied by text explaining the order that the expressions Attempt Number

needed to be evaluated in and why.
Feedback for multi-line questions contains an animated flow- Figure 3: Average Scores Across Attempts
chart describing the control flow the accompanying code. The node
in the flowchart and line of code are simultaneously highlighted as
the program executes and a table containing the evaluated code be- electronic engineering or Internet of Things (group 1) and was fo-
ing executed and a rudimentary symbol table is shown and updated cused on the C programming language. The second class contained
at the same time. 91 students majoring in software engineering (group 2) and was
focused on the Python programming language.
3.4 Generating Questions All students in both classes were non-native English speakers,
The Trace Generator is a command line application with files con- attending their first semester at university and as a consequence of
taining code, template parameters or input text passed as arguments. the pandemic were also studying remotely.
By default, a file containing a single multi-line question, a single- Previous instances of both classes used automated assessment
line question for each line of code, or both is produced ready to be of weekly programming tasks using a plugin in the Moodle virtual
imported in the Moodle XML file format. Other arguments allow learning environment. In order to integrate the code tracing assess-
naming and categorising questions, selecting the language, produc- ments seamlessly, the Trace Generator was designed to produce
ing questions only for specific lines and choosing which type of questions that could be used in the same environment rather than
question to generate. through an external tutor application.
Template values to generate different versions of the source The group 2 students were required to complete three separate
code are supplied in a separate file. This was chosen over using a quizzes containing a total of 16 single-line questions In the first 6
template engine for two reasons, first while basic variations are weeks of class, while the group 1 students were required to complete
quite simple to implement using templating engines, more complex four separate quizzes containing a total of 17 single-line questions.
requirements can be quite difficult. Secondly, as the expected users The multi-line questions were not sufficiently developed and tested
are CS educators they should find it easier to write a short program and so were not used in the classes.
to generate the required values than to learn a new language in
order to be able to use the tool. Placeholders can be used to represent 4.1 Student Engagement
any tokens in the source code including variable names, operators, The Trace Generator allows the creation of large banks of questions
literal values, keywords and function names. such that students could make multiple attempts at code compre-
The generation of questions is done in two phases. First the hension assessments. Thus each attempt becomes a learning event
source code is used to produce a JSON representation of the code where the student is able to correct any mistakes when facing a sim-
and state as it is executed. In the second phase, this JSON repre- ilar (but not the same) question. The average number of attempts
sentation is used to produce questions and feedback in the correct across all quizzes in both classes was 2.1, while approximately 40%
format for the Moodle virtual learning environment. This two part of students made only a single attempt for any quiz.
process is designed to enable easier extensibility in the form of Figure 3 show the average results of students in group 2 for each
alternate programming languages (currently only C and Python of the quizzes as the number of attempts increases. The solid lines
are supported) and alternate output formats for other learning represent the average quiz score for nth and final attempts, while
environments and tutors (currently only Moodle is supported). the dashed lines show the average quiz score for nth but non-final
attempts.
4 USAGE CONTEXT There are several interesting trends to note in this figure. Firstly,
Questions generated using the Trace Generator were used in two the average final submission for attempt x is always higher than
introductory procedural programming classes during the autumn the intermediary average for attempt x − 1, this shows that the
semester of 2020. The first class contains 119 students majoring in students are improving with each attempt. Secondly, the rate at

15
CEP 2022, January 6, 2022, Durham, United Kingdom Seán Russell

which the intermediary averages climb steadily increases for each under which the course was delivered and changes to assessment
subsequent quiz, showing the students are becoming more familiar practices to compensate severely limit any conclusions that can be
with the concept and achieving better scores more quickly. Thirdly, made about the effect on programming ability of the students.
every student who made more than a single attempt achieved the The results are promising enough to suggest that a more thor-
highest grade possible (95% is the lower bound for an A+). ough study should be completed with an aim to gather qualitative
and quantitative data in order to more fully assess its use. The
4.2 Student Results development of the Trace Generator is currently in its infancy, fur-
The advent of the pandemic and remote learning necessitated ther development is planned to incorporate more programming
large changes to the structure and assessment of the classes. These languages and support the object-oriented programming paradigm,
changes significantly impact the comparability of results from this generate questions for different virtual learning environments and
cohort against previous cohorts. With those limitations in mind, develop its use as a web based tutor.
the comparisons show promise.
The students in group 2 were required to complete a capstone REFERENCES
[1] John R Anderson, Frederick G Conrad, and Albert T Corbett. 1989. Skill Acquisi-
assignment during the course that was roughly equivalent in diffi- tion and the LISP Tutor. Cognitive Science 13, 4 (1989), 467–505.
culty to the previous years class. The average result in 2019 was [2] Aivar Annamaa. 2015. Thonny, A Python IDE for Learning Programming. In
68%, while in 2020 the average result was 77%. Differences in the Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer
Science Education (ITiCSE ’15). ACM, New York, NY, USA, 343.
assignment, the mode of delivery, and a University mandated accom- [3] Stephen H Edwards. 2004. Using Software Testing to Move Students from Trial-
modation for student issues all likely contributed to this increase. and-Error to Reflection-in-Action. In Proceedings of the 35th SIGCSE Technical
Exam results in both groups were slightly improved (1% for Symposium on Computer Science Education (SIGCSE ’04). ACM, New York, NY,
USA, 26–30.
group 1 and 2% for group 2) but these were not statistically sig- [4] Juha Helminen and Lauri Malmi. 2010. Jype - a Program Visualization and
nificant. Results in the code comprehension section of the exams Programming Exercise Tool for Python. In Proceedings of the 5th International
Symposium on Software Visualization (SOFTVIS ’10). ACM, New York, NY, USA,
were improved (4% for group 1 and 11% for group 2) but changes 153–162.
in the format of the questions mean the improvement may not be [5] Amruth Kumar. 2004. Web-Based Tutors for Learning Programming in C++/Java.
attributed only to the code tracing exercises. In Proceedings of the 9th Annual SIGCSE Conference on Innovation and Technology
in Computer Science Education (ITiCSE ’04). ACM, New York, NY, USA, 266.
[6] Amruth N Kumar. 2005. Generation of Problems, Answers, Grade, and Feedback—
4.3 Student Experience Case Study of a Fully Automated Tutor. J. Educ. Resour. Comput. 5, 3 (sep 2005).
[7] Amruth N Kumar. 2013. A Study of the Influence of Code-Tracing Problems on
While the students were not surveyed about the code tracing ques- Code-Writing Skills. In Proceedings of the 18th ACM Conference on Innovation
tions used in the course, several students did provide unprompted and Technology in Computer Science Education (ITiCSE ’13). ACM, New York, NY,
USA, 183–188.
anonymous feedback about them. Two students commented on its [8] Teemu Lehtinen, Aleksi Lukkarinen, and Lassi Haaranen. 2021. Students Struggle
usefulness stating “The code tracing helps me a lot to understand to Explain Their Own Program Code. In Proceedings of the 26th ACM Conference
each steps of the process” and “The code tracing told me how a pro- on Innovation and Technology in Computer Science Education V. 1 (ITiCSE ’21).
ACM, New York, NY, USA, 206–212.
gram worked in detail.”. The students knew only that the questions [9] Raymond Lister, Elizabeth S Adams, Sue Fitzgerald, William Fone, John Hamer,
were newly designed and that I wanted them to tell me when they Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto
encountered any problems or inconsistencies in the quizzes. Seppälä, Beth Simon, and Lynda Thomas. 2004. A Multi-National Study of
Reading and Tracing Skills in Novice Programmers. In Working Group Reports
A third student commented on the format and feedback of the from ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE-
quizzes, this student was from group 2 which had less frequent WGR ’04). ACM, New York, NY, USA, 119–150.
[10] Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan,
and longer quizzes. The student stated “The code tracing unit is Yifat Ben-David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz
always too long, and something important is not highlighted”. At Wilusz. 2001. A Multi-National, Multi-Institutional Study of Assessment of
the least this suggests that shorter and more numerous quizzes Programming Skills of First-Year CS Students. In Working Group Reports from
ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE-WGR
should probably be preferred as they would require less time for ’01). ACM, New York, NY, USA, 125–180.
students that have already achieved mastery and take less time to [11] Andrés Moreno, Niko Myller, Erkki Sutinen, and Mordechai Ben-Ari. 2004. Vi-
repeat for students yet to attain it. The second part of the comment sualizing Programs with Jeliot 3. In Proceedings of the Working Conference on
Advanced Visual Interfaces (AVI ’04). ACM, New York, NY, USA, 373–376.
was not accompanied with any more details and as the feedback [12] Ruixiang Qi and Davide Fossati. 2020. Unlimited Trace Tutor: Learning Code
was anonymous more could not be sought. At the least this suggests Tracing With Automatically Generated Programs. In Proceedings of the 51st ACM
Technical Symposium on Computer Science Education (SIGCSE ’20). ACM, New
that more detailed responses should be sought about the feedback York, NY, USA, 427–433.
generated in the questions. [13] Juha Sorva and Teemu Sirkiä. 2010. UUhistle: A Software Tool for Visual Program
Simulation. In Proceedings of the 10th Koli Calling International Conference on
Computing Education Research (Koli Calling ’10). ACM, New York, NY, USA,
5 CONCLUSIONS, LIMITATIONS AND 49–54.
FUTURE WORK [14] Leigh Ann Sudol-DeLyser, Mark Stehlik, and Sharon Carver. 2012. Code Com-
prehension Problems as Learning Events. In Proceedings of the 17th ACM Annual
This paper provides the authors experiences with the use of a new Conference on Innovation and Technology in Computer Science Education (ITiCSE
tool for generating program tracing questions. The fundamental ’12). ACM, New York, NY, USA, 81–86.
[15] Jun Zheng, Sohee Kang, and Brian Harrington. 2019. Immediate Feedback Col-
principle is to encourage students to attain mastery through repe- laborative Code Tracing. In Proceedings of the Western Canadian Conference on
tition of similar code tracing exercises. Evidence is provided that Computing Education (WCCCE ’19). ACM, New York, NY, USA.
the students were willing to repeat the exercises, averaging 2.1
attempts for each quiz, and that their performance on these tasks
improved (often to the point of mastery). However the conditions