DOKK Library

Automated Code Tracing Exercises for CS1

Authors Seán Russell

License CC-BY-4.0

                            Automated Code Tracing Exercises for CS1
                                                                           Seán Russell
                                                                   University College Dublin
                                                                        Dublin, Ireland
ABSTRACT                                                                            of the required skills. But from the perspective of some students
The ability for students to read and comprehend code is a fundamen-                 the same assignment will be perceived as a requirement to produce
tal skill in computer programming. Relying on students to build this                code that gives the “right” answer. Students may also completely
skill through typical programming assignments can lead to many                      bypass the learning opportunity provided because these types of
persevering through trial and error rather than understanding. This                 assessment are particularly vulnerable to plagiarism [10].
paper describes Trace Generator, a work-in-progress application                        Research has shown that some students will resort to the trial
for generating automatically graded code tracing questions for                      and error approach when faced with completing these types of
Python and C programs. The fundamental principles behind this                       programming tasks [3]. Trial and error is unlikely to be successful
work are mastery through repetition and providing comprehensive                     in the long term, but can be quite effective for solving simpler CS1
and understandable feedback to enable students to learn from their                  and CS2 assignments. When the student reaches more advanced
mistakes. Feedback and reflections from the use of the generated                    courses the use of such an approach could have negatively impacted
questions with two introductory procedural programming classes                      the development of many of the key skills used in programming.
(200 students) are also discussed. Analysis of student attempts sug-                   Assessing students on their code comprehension skills directly
gests a willingness to complete quizzes multiple times until they                   both provides an incentive for students to develop a key skill (in-
achieved a satisfactory score (average final result of 91%).                        creasing their grades) and emphasises the importance that we place
                                                                                    on their abilities in this regard. Moreover, these assessments should
CCS CONCEPTS                                                                        provide an opportunity for the students to learn [8, 14].
                                                                                       In aiming to design code comprehension assessments, two fea-
• Social and professional topics → Computing education; CS1;
                                                                                    tures were considered essential. First, feedback should be immediate
• Applied computing → Learning management systems; Computer-
                                                                                    and comprehensive. Second, Students should have the opportunity
assisted instruction.
                                                                                    to apply what they have learned from the feedback in repeated
KEYWORDS                                                                            attempts (with different questions).
                                                                                       The nature of these features requires the construction of a ques-
code tracing, code comprehension, automatic grading, generated                      tion bank of sufficient size that students can repeat the assessments
feedback                                                                            without encountering the same question twice. Manually building
ACM Reference Format:                                                               these questions would be prohibitively time consuming, as such an
Seán Russell. 2022. Automated Code Tracing Exercises for CS1. In Computing          approach that generated questions was used.
Education Practice 2022 (CEP 2022), January 6, 2022, Durham, United Kingdom.           The remaining sections of the paper detail related work, the
ACM, New York, NY, USA, 4 pages.            design principles for questions, and preliminary evaluation.
The principal aim of an introductory programming course is typi-
                                                                                    2   RELATED WORK
cally to teach students to program. While educators will be aware                   The report of the McCracken group [10], highlighted the impor-
that programming involves the combination of many interdepen-                       tance of program comprehension skills in program writing and
dent skills, this may not be as evident to students.                                prompted more investigation into the factors effecting performance
   When learning their first programming language, students must                    in programming tasks. Lister et al. [9] investigated how students
learn the sometimes unforgiving syntax of a new language, data                      approach code tracing problems and categorised the “doodles” that
types and their operations, the effects different statements have                   the students produced in this process. These categories of doo-
on variables and control flow. In parallel, students must develop                   dle, such as Synchronized Trace, Trace, Number and Computation,
strategies and techniques to solve problems using the language.                     were identified in student responses and the percentage of cor-
   From an instructor perspective, a large well designed program-                   rect answers for each category were calculated. The more effective
ming assessment will encourage the development and use of all                       strategies that students use to solve program comprehension tasks
                                                                                    were incorporated into the Trace Generator.
                                                                                       There has been research into the benefits of code tracing tutors
                                                                                    on students abilities in program writing tasks [1, 7, 13]. While some
This work is licensed under a Creative Commons Attribution International
4.0 License.                                                                        studies have reported no benefit [1], others have found some or
                                                                                    significant improvement in code writing abilities [7, 13].
CEP 2022, January 6, 2022, Durham, United Kingdom                                      A large number of tools and tutors have been developed to help
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9561-8/22/01.                                                   students learn code comprehension skills. Some tools provide an-                                             imated visualisation of the execution of programs [11, 13] with

CEP 2022, January 6, 2022, Durham, United Kingdom                                                                                       Seán Russell

internal state of the program displayed as the execution progresses
under the control of the user. Some tools that provide this function-
ality are designed as an integrated development environment (IDE)
which provides visualisation of program execution or a plugin that
provides the same functionality to an existing IDE [2, 13]. More re-
cently, tools of this nature are implemented for use on the Internet
through a web browser [5, 15].
   Beyond visualisation, many programs are designed as tutors to
guide student learning through different aspects of program compre-
hension [4, 6, 12]. Helminen and Malmi [4] used the visualisations
provided as a feedback mechanism to help students understand
the automatically graded programming exercises they completed
using the tool. Other tutors test students on other aspects using
automatically generated questions [6, 12].
   The Trace Generator is designed to reproduce the key features
of these tutors (visualisations for feedback and testing on program
comprehension) without requiring students to use an external
tool/tutor. The generate questions can be imported into a ques-                 Figure 1: Questions displayed tracing the code b = a * 2
tion bank in Moodle without requiring any additional plugins or
libraries. A key motivation was the simplicity of use by the students
in the classes, where all other materials and assessments are hosted           evaluates all of the expression in the code to choose the correct
within the Moodle page of the course. That motivation was also                 answer. This gives the student a structured way to practice the
heavily reinforced by the transition to online learning in the wake            Synchronized Trace doodle described in [9].
of the COVID-19 pandemic.                                                         Further formats are currently under development focused on
                                                                               two areas, (1) variable representation in the stack and (2) point-
3 CODE TRACING QUESTIONS                                                       ers/references and memory. A student that successfully completes
                                                                               code tracing exercises in all of the above formats would be required
3.1 Design of Questions                                                        to have a good understanding of the underlying principles by which
The Trace Generator is intended to produce questions in several                their code is executed.
formats. This is to complement the deepening knowledge and un-
derstanding of the students as they progress through a CS1 class.              3.2    Answering Questions
   The first format used in a course would be the single-line ques-
tion. Students are required to trace the execution and consequences            The questions generated by the tool require many responses from
of a single line of code from a short program. Answering these                 the students. In order to make the process of completing the ques-
questions correctly requires an understanding of the evaluation of             tions less time consuming (and repeating assessments more palat-
expressions, focusing on the order that calculations are performed             able), many of the questions are multiple choice (dropdown boxes).
in, the result of individual calculations, the data type of that result        Explanation, code executed, data type and line number questions
and which line of code will be executed next.                                  are all multiple choice, while the result of expression and value
   Where the code contains an assignment or a function with side-              of variable questions must be typed by the student. The multiple
effects (like scanf), the resulting changes to variables in memory             choice questions contain distractors generated based on the pro-
must also be entered. This gives the student a structured way to               gram code being traced in the question.
practice the Computation and Number doodles described in [9]. An
example of this type of question can be seen in Figure 1.                      3.3    Feedback
   The order in which operations are executed is based first on                In order to more easily establish the code tracing questions as
operator precedence, and then from left to right. While this may               learning events, sufficient feedback must be provided [14]. While
not match the actual order of execution when optimisations are                 no automated system is likely to provide feedback as directed and
applied, it is consistent with the order that would be used if the             individual as an instructor or teaching assistant, thorough feedback
variables or code literals were replaced by functions. Addresses               is generated for each question. Each individual part of the question
used in these questions are abstract identifiers and are included for          is highlighted as correct or incorrect and the correct answer is
use with the scanf function.                                                   shown. In addition to this, general feedback is provided in the form
   The second format used in a course would be the multi-line ques-            of an animation accompanied by explanatory text.
tion. Students are required to trace the execution and consequences               Figure 2 shows a frame from the animated feedback of a single
of a sequence of statements or a short program. Answering these                line question (the same question that is shown in Figure 1). A
questions correctly requires an understanding of the control flow              representation of the steps in executing the statement b = a * 2 is
in the code and the changing values of variables as execution pro-             shown. During the animation, each part highlighted in the order
gresses. These questions do not require the detailed step-by-step              that they are evaluated and the results of each expression moving
break down of each line of code, but still require that the student            to the nodes in the tree where they are used. The animation is

Automated Code Tracing Exercises for CS1                                                                                CEP 2022, January 6, 2022, Durham, United Kingdom




                                                                               Average Result

                                                                                                 8.0                                                            quiz
      Figure 2: Feedback for a single-line question variant                                                                                                         Q3
                                                                                                 7.5                                                                Intermediary
                                                                                                       1      2     3       4         5          6   7      8              9
accompanied by text explaining the order that the expressions                                                                   Attempt Number

needed to be evaluated in and why.
   Feedback for multi-line questions contains an animated flow-                                            Figure 3: Average Scores Across Attempts
chart describing the control flow the accompanying code. The node
in the flowchart and line of code are simultaneously highlighted as
the program executes and a table containing the evaluated code be-             electronic engineering or Internet of Things (group 1) and was fo-
ing executed and a rudimentary symbol table is shown and updated               cused on the C programming language. The second class contained
at the same time.                                                              91 students majoring in software engineering (group 2) and was
                                                                               focused on the Python programming language.
3.4     Generating Questions                                                      All students in both classes were non-native English speakers,
The Trace Generator is a command line application with files con-              attending their first semester at university and as a consequence of
taining code, template parameters or input text passed as arguments.           the pandemic were also studying remotely.
By default, a file containing a single multi-line question, a single-             Previous instances of both classes used automated assessment
line question for each line of code, or both is produced ready to be           of weekly programming tasks using a plugin in the Moodle virtual
imported in the Moodle XML file format. Other arguments allow                  learning environment. In order to integrate the code tracing assess-
naming and categorising questions, selecting the language, produc-             ments seamlessly, the Trace Generator was designed to produce
ing questions only for specific lines and choosing which type of               questions that could be used in the same environment rather than
question to generate.                                                          through an external tutor application.
    Template values to generate different versions of the source                  The group 2 students were required to complete three separate
code are supplied in a separate file. This was chosen over using a             quizzes containing a total of 16 single-line questions In the first 6
template engine for two reasons, first while basic variations are              weeks of class, while the group 1 students were required to complete
quite simple to implement using templating engines, more complex               four separate quizzes containing a total of 17 single-line questions.
requirements can be quite difficult. Secondly, as the expected users           The multi-line questions were not sufficiently developed and tested
are CS educators they should find it easier to write a short program           and so were not used in the classes.
to generate the required values than to learn a new language in
order to be able to use the tool. Placeholders can be used to represent         4.1                    Student Engagement
any tokens in the source code including variable names, operators,             The Trace Generator allows the creation of large banks of questions
literal values, keywords and function names.                                   such that students could make multiple attempts at code compre-
    The generation of questions is done in two phases. First the               hension assessments. Thus each attempt becomes a learning event
source code is used to produce a JSON representation of the code               where the student is able to correct any mistakes when facing a sim-
and state as it is executed. In the second phase, this JSON repre-             ilar (but not the same) question. The average number of attempts
sentation is used to produce questions and feedback in the correct             across all quizzes in both classes was 2.1, while approximately 40%
format for the Moodle virtual learning environment. This two part              of students made only a single attempt for any quiz.
process is designed to enable easier extensibility in the form of                 Figure 3 show the average results of students in group 2 for each
alternate programming languages (currently only C and Python                   of the quizzes as the number of attempts increases. The solid lines
are supported) and alternate output formats for other learning                 represent the average quiz score for nth and final attempts, while
environments and tutors (currently only Moodle is supported).                  the dashed lines show the average quiz score for nth but non-final
4     USAGE CONTEXT                                                               There are several interesting trends to note in this figure. Firstly,
Questions generated using the Trace Generator were used in two                 the average final submission for attempt x is always higher than
introductory procedural programming classes during the autumn                  the intermediary average for attempt x − 1, this shows that the
semester of 2020. The first class contains 119 students majoring in            students are improving with each attempt. Secondly, the rate at

CEP 2022, January 6, 2022, Durham, United Kingdom                                                                                                   Seán Russell

which the intermediary averages climb steadily increases for each           under which the course was delivered and changes to assessment
subsequent quiz, showing the students are becoming more familiar            practices to compensate severely limit any conclusions that can be
with the concept and achieving better scores more quickly. Thirdly,         made about the effect on programming ability of the students.
every student who made more than a single attempt achieved the                 The results are promising enough to suggest that a more thor-
highest grade possible (95% is the lower bound for an A+).                  ough study should be completed with an aim to gather qualitative
                                                                            and quantitative data in order to more fully assess its use. The
4.2    Student Results                                                      development of the Trace Generator is currently in its infancy, fur-
The advent of the pandemic and remote learning necessitated                 ther development is planned to incorporate more programming
large changes to the structure and assessment of the classes. These         languages and support the object-oriented programming paradigm,
changes significantly impact the comparability of results from this         generate questions for different virtual learning environments and
cohort against previous cohorts. With those limitations in mind,            develop its use as a web based tutor.
the comparisons show promise.
   The students in group 2 were required to complete a capstone             REFERENCES
                                                                             [1] John R Anderson, Frederick G Conrad, and Albert T Corbett. 1989. Skill Acquisi-
assignment during the course that was roughly equivalent in diffi-               tion and the LISP Tutor. Cognitive Science 13, 4 (1989), 467–505.
culty to the previous years class. The average result in 2019 was            [2] Aivar Annamaa. 2015. Thonny, A Python IDE for Learning Programming. In
68%, while in 2020 the average result was 77%. Differences in the                Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer
                                                                                 Science Education (ITiCSE ’15). ACM, New York, NY, USA, 343.
assignment, the mode of delivery, and a University mandated accom-           [3] Stephen H Edwards. 2004. Using Software Testing to Move Students from Trial-
modation for student issues all likely contributed to this increase.             and-Error to Reflection-in-Action. In Proceedings of the 35th SIGCSE Technical
   Exam results in both groups were slightly improved (1% for                    Symposium on Computer Science Education (SIGCSE ’04). ACM, New York, NY,
                                                                                 USA, 26–30.
group 1 and 2% for group 2) but these were not statistically sig-            [4] Juha Helminen and Lauri Malmi. 2010. Jype - a Program Visualization and
nificant. Results in the code comprehension section of the exams                 Programming Exercise Tool for Python. In Proceedings of the 5th International
                                                                                 Symposium on Software Visualization (SOFTVIS ’10). ACM, New York, NY, USA,
were improved (4% for group 1 and 11% for group 2) but changes                   153–162.
in the format of the questions mean the improvement may not be               [5] Amruth Kumar. 2004. Web-Based Tutors for Learning Programming in C++/Java.
attributed only to the code tracing exercises.                                   In Proceedings of the 9th Annual SIGCSE Conference on Innovation and Technology
                                                                                 in Computer Science Education (ITiCSE ’04). ACM, New York, NY, USA, 266.
                                                                             [6] Amruth N Kumar. 2005. Generation of Problems, Answers, Grade, and Feedback—
4.3    Student Experience                                                        Case Study of a Fully Automated Tutor. J. Educ. Resour. Comput. 5, 3 (sep 2005).
                                                                             [7] Amruth N Kumar. 2013. A Study of the Influence of Code-Tracing Problems on
While the students were not surveyed about the code tracing ques-                Code-Writing Skills. In Proceedings of the 18th ACM Conference on Innovation
tions used in the course, several students did provide unprompted                and Technology in Computer Science Education (ITiCSE ’13). ACM, New York, NY,
                                                                                 USA, 183–188.
anonymous feedback about them. Two students commented on its                 [8] Teemu Lehtinen, Aleksi Lukkarinen, and Lassi Haaranen. 2021. Students Struggle
usefulness stating “The code tracing helps me a lot to understand                to Explain Their Own Program Code. In Proceedings of the 26th ACM Conference
each steps of the process” and “The code tracing told me how a pro-              on Innovation and Technology in Computer Science Education V. 1 (ITiCSE ’21).
                                                                                 ACM, New York, NY, USA, 206–212.
gram worked in detail.”. The students knew only that the questions           [9] Raymond Lister, Elizabeth S Adams, Sue Fitzgerald, William Fone, John Hamer,
were newly designed and that I wanted them to tell me when they                  Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto
encountered any problems or inconsistencies in the quizzes.                      Seppälä, Beth Simon, and Lynda Thomas. 2004. A Multi-National Study of
                                                                                 Reading and Tracing Skills in Novice Programmers. In Working Group Reports
   A third student commented on the format and feedback of the                   from ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE-
quizzes, this student was from group 2 which had less frequent                   WGR ’04). ACM, New York, NY, USA, 119–150.
                                                                            [10] Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan,
and longer quizzes. The student stated “The code tracing unit is                 Yifat Ben-David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz
always too long, and something important is not highlighted”. At                 Wilusz. 2001. A Multi-National, Multi-Institutional Study of Assessment of
the least this suggests that shorter and more numerous quizzes                   Programming Skills of First-Year CS Students. In Working Group Reports from
                                                                                 ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE-WGR
should probably be preferred as they would require less time for                 ’01). ACM, New York, NY, USA, 125–180.
students that have already achieved mastery and take less time to           [11] Andrés Moreno, Niko Myller, Erkki Sutinen, and Mordechai Ben-Ari. 2004. Vi-
repeat for students yet to attain it. The second part of the comment             sualizing Programs with Jeliot 3. In Proceedings of the Working Conference on
                                                                                 Advanced Visual Interfaces (AVI ’04). ACM, New York, NY, USA, 373–376.
was not accompanied with any more details and as the feedback               [12] Ruixiang Qi and Davide Fossati. 2020. Unlimited Trace Tutor: Learning Code
was anonymous more could not be sought. At the least this suggests               Tracing With Automatically Generated Programs. In Proceedings of the 51st ACM
                                                                                 Technical Symposium on Computer Science Education (SIGCSE ’20). ACM, New
that more detailed responses should be sought about the feedback                 York, NY, USA, 427–433.
generated in the questions.                                                 [13] Juha Sorva and Teemu Sirkiä. 2010. UUhistle: A Software Tool for Visual Program
                                                                                 Simulation. In Proceedings of the 10th Koli Calling International Conference on
                                                                                 Computing Education Research (Koli Calling ’10). ACM, New York, NY, USA,
5     CONCLUSIONS, LIMITATIONS AND                                               49–54.
      FUTURE WORK                                                           [14] Leigh Ann Sudol-DeLyser, Mark Stehlik, and Sharon Carver. 2012. Code Com-
                                                                                 prehension Problems as Learning Events. In Proceedings of the 17th ACM Annual
This paper provides the authors experiences with the use of a new                Conference on Innovation and Technology in Computer Science Education (ITiCSE
tool for generating program tracing questions. The fundamental                   ’12). ACM, New York, NY, USA, 81–86.
                                                                            [15] Jun Zheng, Sohee Kang, and Brian Harrington. 2019. Immediate Feedback Col-
principle is to encourage students to attain mastery through repe-               laborative Code Tracing. In Proceedings of the Western Canadian Conference on
tition of similar code tracing exercises. Evidence is provided that              Computing Education (WCCCE ’19). ACM, New York, NY, USA.
the students were willing to repeat the exercises, averaging 2.1
attempts for each quiz, and that their performance on these tasks
improved (often to the point of mastery). However the conditions