Automated Code Tracing Exercises for CS1 Seán Russell University College Dublin Dublin, Ireland sean.russell@ucd.ie ABSTRACT of the required skills. But from the perspective of some students The ability for students to read and comprehend code is a fundamen- the same assignment will be perceived as a requirement to produce tal skill in computer programming. Relying on students to build this code that gives the “right” answer. Students may also completely skill through typical programming assignments can lead to many bypass the learning opportunity provided because these types of persevering through trial and error rather than understanding. This assessment are particularly vulnerable to plagiarism [10]. paper describes Trace Generator, a work-in-progress application Research has shown that some students will resort to the trial for generating automatically graded code tracing questions for and error approach when faced with completing these types of Python and C programs. The fundamental principles behind this programming tasks [3]. Trial and error is unlikely to be successful work are mastery through repetition and providing comprehensive in the long term, but can be quite effective for solving simpler CS1 and understandable feedback to enable students to learn from their and CS2 assignments. When the student reaches more advanced mistakes. Feedback and reflections from the use of the generated courses the use of such an approach could have negatively impacted questions with two introductory procedural programming classes the development of many of the key skills used in programming. (200 students) are also discussed. Analysis of student attempts sug- Assessing students on their code comprehension skills directly gests a willingness to complete quizzes multiple times until they both provides an incentive for students to develop a key skill (in- achieved a satisfactory score (average final result of 91%). creasing their grades) and emphasises the importance that we place on their abilities in this regard. Moreover, these assessments should CCS CONCEPTS provide an opportunity for the students to learn [8, 14]. In aiming to design code comprehension assessments, two fea- • Social and professional topics → Computing education; CS1; tures were considered essential. First, feedback should be immediate • Applied computing → Learning management systems; Computer- and comprehensive. Second, Students should have the opportunity assisted instruction. to apply what they have learned from the feedback in repeated KEYWORDS attempts (with different questions). The nature of these features requires the construction of a ques- code tracing, code comprehension, automatic grading, generated tion bank of sufficient size that students can repeat the assessments feedback without encountering the same question twice. Manually building ACM Reference Format: these questions would be prohibitively time consuming, as such an Seán Russell. 2022. Automated Code Tracing Exercises for CS1. In Computing approach that generated questions was used. Education Practice 2022 (CEP 2022), January 6, 2022, Durham, United Kingdom. The remaining sections of the paper detail related work, the ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3498343.3498347 design principles for questions, and preliminary evaluation. 1 INTRODUCTION The principal aim of an introductory programming course is typi- 2 RELATED WORK cally to teach students to program. While educators will be aware The report of the McCracken group [10], highlighted the impor- that programming involves the combination of many interdepen- tance of program comprehension skills in program writing and dent skills, this may not be as evident to students. prompted more investigation into the factors effecting performance When learning their first programming language, students must in programming tasks. Lister et al. [9] investigated how students learn the sometimes unforgiving syntax of a new language, data approach code tracing problems and categorised the “doodles” that types and their operations, the effects different statements have the students produced in this process. These categories of doo- on variables and control flow. In parallel, students must develop dle, such as Synchronized Trace, Trace, Number and Computation, strategies and techniques to solve problems using the language. were identified in student responses and the percentage of cor- From an instructor perspective, a large well designed program- rect answers for each category were calculated. The more effective ming assessment will encourage the development and use of all strategies that students use to solve program comprehension tasks were incorporated into the Trace Generator. There has been research into the benefits of code tracing tutors on students abilities in program writing tasks [1, 7, 13]. While some This work is licensed under a Creative Commons Attribution International 4.0 License. studies have reported no benefit [1], others have found some or significant improvement in code writing abilities [7, 13]. CEP 2022, January 6, 2022, Durham, United Kingdom A large number of tools and tutors have been developed to help © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9561-8/22/01. students learn code comprehension skills. Some tools provide an- https://doi.org/10.1145/3498343.3498347 imated visualisation of the execution of programs [11, 13] with 13 CEP 2022, January 6, 2022, Durham, United Kingdom Seán Russell internal state of the program displayed as the execution progresses under the control of the user. Some tools that provide this function- ality are designed as an integrated development environment (IDE) which provides visualisation of program execution or a plugin that provides the same functionality to an existing IDE [2, 13]. More re- cently, tools of this nature are implemented for use on the Internet through a web browser [5, 15]. Beyond visualisation, many programs are designed as tutors to guide student learning through different aspects of program compre- hension [4, 6, 12]. Helminen and Malmi [4] used the visualisations provided as a feedback mechanism to help students understand the automatically graded programming exercises they completed using the tool. Other tutors test students on other aspects using automatically generated questions [6, 12]. The Trace Generator is designed to reproduce the key features of these tutors (visualisations for feedback and testing on program comprehension) without requiring students to use an external tool/tutor. The generate questions can be imported into a ques- Figure 1: Questions displayed tracing the code b = a * 2 tion bank in Moodle without requiring any additional plugins or libraries. A key motivation was the simplicity of use by the students in the classes, where all other materials and assessments are hosted evaluates all of the expression in the code to choose the correct within the Moodle page of the course. That motivation was also answer. This gives the student a structured way to practice the heavily reinforced by the transition to online learning in the wake Synchronized Trace doodle described in [9]. of the COVID-19 pandemic. Further formats are currently under development focused on two areas, (1) variable representation in the stack and (2) point- 3 CODE TRACING QUESTIONS ers/references and memory. A student that successfully completes code tracing exercises in all of the above formats would be required 3.1 Design of Questions to have a good understanding of the underlying principles by which The Trace Generator is intended to produce questions in several their code is executed. formats. This is to complement the deepening knowledge and un- derstanding of the students as they progress through a CS1 class. 3.2 Answering Questions The first format used in a course would be the single-line ques- tion. Students are required to trace the execution and consequences The questions generated by the tool require many responses from of a single line of code from a short program. Answering these the students. In order to make the process of completing the ques- questions correctly requires an understanding of the evaluation of tions less time consuming (and repeating assessments more palat- expressions, focusing on the order that calculations are performed able), many of the questions are multiple choice (dropdown boxes). in, the result of individual calculations, the data type of that result Explanation, code executed, data type and line number questions and which line of code will be executed next. are all multiple choice, while the result of expression and value Where the code contains an assignment or a function with side- of variable questions must be typed by the student. The multiple effects (like scanf), the resulting changes to variables in memory choice questions contain distractors generated based on the pro- must also be entered. This gives the student a structured way to gram code being traced in the question. practice the Computation and Number doodles described in [9]. An example of this type of question can be seen in Figure 1. 3.3 Feedback The order in which operations are executed is based first on In order to more easily establish the code tracing questions as operator precedence, and then from left to right. While this may learning events, sufficient feedback must be provided [14]. While not match the actual order of execution when optimisations are no automated system is likely to provide feedback as directed and applied, it is consistent with the order that would be used if the individual as an instructor or teaching assistant, thorough feedback variables or code literals were replaced by functions. Addresses is generated for each question. Each individual part of the question used in these questions are abstract identifiers and are included for is highlighted as correct or incorrect and the correct answer is use with the scanf function. shown. In addition to this, general feedback is provided in the form The second format used in a course would be the multi-line ques- of an animation accompanied by explanatory text. tion. Students are required to trace the execution and consequences Figure 2 shows a frame from the animated feedback of a single of a sequence of statements or a short program. Answering these line question (the same question that is shown in Figure 1). A questions correctly requires an understanding of the control flow representation of the steps in executing the statement b = a * 2 is in the code and the changing values of variables as execution pro- shown. During the animation, each part highlighted in the order gresses. These questions do not require the detailed step-by-step that they are evaluated and the results of each expression moving break down of each line of code, but still require that the student to the nodes in the tree where they are used. The animation is 14 Automated Code Tracing Exercises for CS1 CEP 2022, January 6, 2022, Durham, United Kingdom 10.0 9.5 9.0 Average Result 8.5 8.0 quiz Q1 Q2 Figure 2: Feedback for a single-line question variant Q3 type Final 7.5 Intermediary 1 2 3 4 5 6 7 8 9 accompanied by text explaining the order that the expressions Attempt Number needed to be evaluated in and why. Feedback for multi-line questions contains an animated flow- Figure 3: Average Scores Across Attempts chart describing the control flow the accompanying code. The node in the flowchart and line of code are simultaneously highlighted as the program executes and a table containing the evaluated code be- electronic engineering or Internet of Things (group 1) and was fo- ing executed and a rudimentary symbol table is shown and updated cused on the C programming language. The second class contained at the same time. 91 students majoring in software engineering (group 2) and was focused on the Python programming language. 3.4 Generating Questions All students in both classes were non-native English speakers, The Trace Generator is a command line application with files con- attending their first semester at university and as a consequence of taining code, template parameters or input text passed as arguments. the pandemic were also studying remotely. By default, a file containing a single multi-line question, a single- Previous instances of both classes used automated assessment line question for each line of code, or both is produced ready to be of weekly programming tasks using a plugin in the Moodle virtual imported in the Moodle XML file format. Other arguments allow learning environment. In order to integrate the code tracing assess- naming and categorising questions, selecting the language, produc- ments seamlessly, the Trace Generator was designed to produce ing questions only for specific lines and choosing which type of questions that could be used in the same environment rather than question to generate. through an external tutor application. Template values to generate different versions of the source The group 2 students were required to complete three separate code are supplied in a separate file. This was chosen over using a quizzes containing a total of 16 single-line questions In the first 6 template engine for two reasons, first while basic variations are weeks of class, while the group 1 students were required to complete quite simple to implement using templating engines, more complex four separate quizzes containing a total of 17 single-line questions. requirements can be quite difficult. Secondly, as the expected users The multi-line questions were not sufficiently developed and tested are CS educators they should find it easier to write a short program and so were not used in the classes. to generate the required values than to learn a new language in order to be able to use the tool. Placeholders can be used to represent 4.1 Student Engagement any tokens in the source code including variable names, operators, The Trace Generator allows the creation of large banks of questions literal values, keywords and function names. such that students could make multiple attempts at code compre- The generation of questions is done in two phases. First the hension assessments. Thus each attempt becomes a learning event source code is used to produce a JSON representation of the code where the student is able to correct any mistakes when facing a sim- and state as it is executed. In the second phase, this JSON repre- ilar (but not the same) question. The average number of attempts sentation is used to produce questions and feedback in the correct across all quizzes in both classes was 2.1, while approximately 40% format for the Moodle virtual learning environment. This two part of students made only a single attempt for any quiz. process is designed to enable easier extensibility in the form of Figure 3 show the average results of students in group 2 for each alternate programming languages (currently only C and Python of the quizzes as the number of attempts increases. The solid lines are supported) and alternate output formats for other learning represent the average quiz score for nth and final attempts, while environments and tutors (currently only Moodle is supported). the dashed lines show the average quiz score for nth but non-final attempts. 4 USAGE CONTEXT There are several interesting trends to note in this figure. Firstly, Questions generated using the Trace Generator were used in two the average final submission for attempt x is always higher than introductory procedural programming classes during the autumn the intermediary average for attempt x − 1, this shows that the semester of 2020. The first class contains 119 students majoring in students are improving with each attempt. Secondly, the rate at 15 CEP 2022, January 6, 2022, Durham, United Kingdom Seán Russell which the intermediary averages climb steadily increases for each under which the course was delivered and changes to assessment subsequent quiz, showing the students are becoming more familiar practices to compensate severely limit any conclusions that can be with the concept and achieving better scores more quickly. Thirdly, made about the effect on programming ability of the students. every student who made more than a single attempt achieved the The results are promising enough to suggest that a more thor- highest grade possible (95% is the lower bound for an A+). ough study should be completed with an aim to gather qualitative and quantitative data in order to more fully assess its use. The 4.2 Student Results development of the Trace Generator is currently in its infancy, fur- The advent of the pandemic and remote learning necessitated ther development is planned to incorporate more programming large changes to the structure and assessment of the classes. These languages and support the object-oriented programming paradigm, changes significantly impact the comparability of results from this generate questions for different virtual learning environments and cohort against previous cohorts. With those limitations in mind, develop its use as a web based tutor. the comparisons show promise. The students in group 2 were required to complete a capstone REFERENCES [1] John R Anderson, Frederick G Conrad, and Albert T Corbett. 1989. Skill Acquisi- assignment during the course that was roughly equivalent in diffi- tion and the LISP Tutor. Cognitive Science 13, 4 (1989), 467–505. culty to the previous years class. The average result in 2019 was [2] Aivar Annamaa. 2015. Thonny, A Python IDE for Learning Programming. In 68%, while in 2020 the average result was 77%. Differences in the Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE ’15). ACM, New York, NY, USA, 343. assignment, the mode of delivery, and a University mandated accom- [3] Stephen H Edwards. 2004. Using Software Testing to Move Students from Trial- modation for student issues all likely contributed to this increase. and-Error to Reflection-in-Action. In Proceedings of the 35th SIGCSE Technical Exam results in both groups were slightly improved (1% for Symposium on Computer Science Education (SIGCSE ’04). ACM, New York, NY, USA, 26–30. group 1 and 2% for group 2) but these were not statistically sig- [4] Juha Helminen and Lauri Malmi. 2010. Jype - a Program Visualization and nificant. Results in the code comprehension section of the exams Programming Exercise Tool for Python. In Proceedings of the 5th International Symposium on Software Visualization (SOFTVIS ’10). ACM, New York, NY, USA, were improved (4% for group 1 and 11% for group 2) but changes 153–162. in the format of the questions mean the improvement may not be [5] Amruth Kumar. 2004. Web-Based Tutors for Learning Programming in C++/Java. attributed only to the code tracing exercises. In Proceedings of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ITiCSE ’04). ACM, New York, NY, USA, 266. [6] Amruth N Kumar. 2005. Generation of Problems, Answers, Grade, and Feedback— 4.3 Student Experience Case Study of a Fully Automated Tutor. J. Educ. Resour. Comput. 5, 3 (sep 2005). [7] Amruth N Kumar. 2013. A Study of the Influence of Code-Tracing Problems on While the students were not surveyed about the code tracing ques- Code-Writing Skills. In Proceedings of the 18th ACM Conference on Innovation tions used in the course, several students did provide unprompted and Technology in Computer Science Education (ITiCSE ’13). ACM, New York, NY, USA, 183–188. anonymous feedback about them. Two students commented on its [8] Teemu Lehtinen, Aleksi Lukkarinen, and Lassi Haaranen. 2021. Students Struggle usefulness stating “The code tracing helps me a lot to understand to Explain Their Own Program Code. In Proceedings of the 26th ACM Conference each steps of the process” and “The code tracing told me how a pro- on Innovation and Technology in Computer Science Education V. 1 (ITiCSE ’21). ACM, New York, NY, USA, 206–212. gram worked in detail.”. The students knew only that the questions [9] Raymond Lister, Elizabeth S Adams, Sue Fitzgerald, William Fone, John Hamer, were newly designed and that I wanted them to tell me when they Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto encountered any problems or inconsistencies in the quizzes. Seppälä, Beth Simon, and Lynda Thomas. 2004. A Multi-National Study of Reading and Tracing Skills in Novice Programmers. In Working Group Reports A third student commented on the format and feedback of the from ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE- quizzes, this student was from group 2 which had less frequent WGR ’04). ACM, New York, NY, USA, 119–150. [10] Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan, and longer quizzes. The student stated “The code tracing unit is Yifat Ben-David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz always too long, and something important is not highlighted”. At Wilusz. 2001. A Multi-National, Multi-Institutional Study of Assessment of the least this suggests that shorter and more numerous quizzes Programming Skills of First-Year CS Students. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE-WGR should probably be preferred as they would require less time for ’01). ACM, New York, NY, USA, 125–180. students that have already achieved mastery and take less time to [11] Andrés Moreno, Niko Myller, Erkki Sutinen, and Mordechai Ben-Ari. 2004. Vi- repeat for students yet to attain it. The second part of the comment sualizing Programs with Jeliot 3. In Proceedings of the Working Conference on Advanced Visual Interfaces (AVI ’04). ACM, New York, NY, USA, 373–376. was not accompanied with any more details and as the feedback [12] Ruixiang Qi and Davide Fossati. 2020. Unlimited Trace Tutor: Learning Code was anonymous more could not be sought. At the least this suggests Tracing With Automatically Generated Programs. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ’20). ACM, New that more detailed responses should be sought about the feedback York, NY, USA, 427–433. generated in the questions. [13] Juha Sorva and Teemu Sirkiä. 2010. UUhistle: A Software Tool for Visual Program Simulation. In Proceedings of the 10th Koli Calling International Conference on Computing Education Research (Koli Calling ’10). ACM, New York, NY, USA, 5 CONCLUSIONS, LIMITATIONS AND 49–54. FUTURE WORK [14] Leigh Ann Sudol-DeLyser, Mark Stehlik, and Sharon Carver. 2012. Code Com- prehension Problems as Learning Events. In Proceedings of the 17th ACM Annual This paper provides the authors experiences with the use of a new Conference on Innovation and Technology in Computer Science Education (ITiCSE tool for generating program tracing questions. The fundamental ’12). ACM, New York, NY, USA, 81–86. [15] Jun Zheng, Sohee Kang, and Brian Harrington. 2019. Immediate Feedback Col- principle is to encourage students to attain mastery through repe- laborative Code Tracing. In Proceedings of the Western Canadian Conference on tition of similar code tracing exercises. Evidence is provided that Computing Education (WCCCE ’19). ACM, New York, NY, USA. the students were willing to repeat the exercises, averaging 2.1 attempts for each quiz, and that their performance on these tasks improved (often to the point of mastery). However the conditions 16
Authors Seán Russell
License CC-BY-4.0