Grading University Students with LLMs

L. Korthals; H. Rosenbusch; R. Grasman; I. Visser

doi:https://doi.org/10.1007/978-3-031-99264-3_5

Grading University Students with LLMs Performance and Acceptance of a Canvas-Based Automation

Authors	L. Korthals H. Rosenbusch R. Grasman I. Visser
Publication date	2025
Host editors	A.I. Cristea E. Walker Y. Lu O.C. Santos S. Isotani
Book title	Artificial Intelligence in Education : Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED
Book subtitle	26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025 : proceedings
ISBN	9783031992636
ISBN (electronic)	9783031992643
Series	Communications in Computer and Information Science
Event	Poster papers and late breaking results, workshops and tutorials, practitioners, industry and policy track, doctoral consortium, blue sky and wideAIED papers presented at the 26th International Conference on Artificial Intelligence in Education, AIED 2025
Volume \| Issue number	II
Pages (from-to)	36-43
Number of pages	8
Publisher	Cham: Springer
Organisations	Faculty of Social and Behavioural Sciences (FMG) - Psychology Research Institute (PsyRes)
Abstract	Teachers in higher education spend considerable time grading assignments rather than tutoring students. Large language models (LLMs) could address this by generating human-like grades and feedback for assignments. However, accounts of their practical application are scarce. We wrote Python code to integrate the Canvas learning management system with an API for GDPR-compliant LLM access. We used this AI system to grade and feedback weekly assignments of 58 graduate students (47 study participants) enrolled in an introductory programming course. LLM-grading was fast, cost efficient, and relatively accurate: GPT-4o and human graders agreed perfectly in 80% of 6345 evaluated student answers. Human and LLM-grades were positively correlated (r=.570-.866 for each assignment) but humans awarded higher grades. Disagreements occurred because human graders overlooked student mistakes, and GPT-4o graded ambiguous cases stricter and misgraded some questions. Importantly, we think most LLM-grading mistakes can be reconciled by avoiding certain tasks and improving grading rubrics. Most students (77%) found the LLM-generated feedback helpful, and some (33%) appreciated the rapid grading. However, multiple students (30%) stated that receiving preliminary LLM-grades that were lower than their actual grades made them anxious or upset. We conclude that LLM-based grading and feedback can, and likely should, be used to optimize teachers’ resources. Canvas users can adopt our pipeline by simply adding their own API keys, Canvas IDs, and assignments: https://github.com/lukekorthals/canvas-llm-integration.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-031-99264-3_5
Other links	https://github.com/lukekorthals/canvas-llm-integration https://www.scopus.com/pages/publications/105013027038
Downloads	978-3-031-99264-3_5 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Grading University Students with LLMs Performance and Acceptance of a Canvas-Based Automation