Grading University Students with LLMs Performance and Acceptance of a Canvas-Based Automation

Open Access
Authors
Publication date 2025
Host editors
  • A.I. Cristea
  • E. Walker
  • Y. Lu
  • O.C. Santos
  • S. Isotani
Book title Artificial Intelligence in Education : Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED
Book subtitle 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025 : proceedings
ISBN
  • 9783031992636
ISBN (electronic)
  • 9783031992643
Series Communications in Computer and Information Science
Event Poster papers and late breaking results, workshops and tutorials, practitioners, industry and policy track, doctoral consortium, blue sky and wideAIED papers presented at the 26th International Conference on Artificial Intelligence in Education, AIED 2025
Volume | Issue number II
Pages (from-to) 36-43
Number of pages 8
Publisher Cham: Springer
Organisations
  • Faculty of Social and Behavioural Sciences (FMG) - Psychology Research Institute (PsyRes)
Abstract

Teachers in higher education spend considerable time grading assignments rather than tutoring students. Large language models (LLMs) could address this by generating human-like grades and feedback for assignments. However, accounts of their practical application are scarce. We wrote Python code to integrate the Canvas learning management system with an API for GDPR-compliant LLM access. We used this AI system to grade and feedback weekly assignments of 58 graduate students (47 study participants) enrolled in an introductory programming course. LLM-grading was fast, cost efficient, and relatively accurate: GPT-4o and human graders agreed perfectly in 80% of 6345 evaluated student answers. Human and LLM-grades were positively correlated (r=.570-.866 for each assignment) but humans awarded higher grades. Disagreements occurred because human graders overlooked student mistakes, and GPT-4o graded ambiguous cases stricter and misgraded some questions. Importantly, we think most LLM-grading mistakes can be reconciled by avoiding certain tasks and improving grading rubrics. Most students (77%) found the LLM-generated feedback helpful, and some (33%) appreciated the rapid grading. However, multiple students (30%) stated that receiving preliminary LLM-grades that were lower than their actual grades made them anxious or upset. We conclude that LLM-based grading and feedback can, and likely should, be used to optimize teachers’ resources. Canvas users can adopt our pipeline by simply adding their own API keys, Canvas IDs, and assignments: https://github.com/lukekorthals/canvas-llm-integration.

Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-3-031-99264-3_5
Other links https://github.com/lukekorthals/canvas-llm-integration https://www.scopus.com/pages/publications/105013027038
Downloads
978-3-031-99264-3_5 (Final published version)
Permalink to this page
Back