Grading University Students with LLMs Performance and Acceptance of a Canvas-Based Automation
| Authors | |
|---|---|
| Publication date | 2025 |
| Host editors |
|
| Book title | Artificial Intelligence in Education : Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED |
| Book subtitle | 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025 : proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Communications in Computer and Information Science |
| Event | Poster papers and late breaking results, workshops and tutorials, practitioners, industry and policy track, doctoral consortium, blue sky and wideAIED papers presented at the 26th International Conference on Artificial Intelligence in Education, AIED 2025 |
| Volume | Issue number | II |
| Pages (from-to) | 36-43 |
| Number of pages | 8 |
| Publisher | Cham: Springer |
| Organisations |
|
| Abstract |
Teachers in higher education spend considerable time grading assignments rather than tutoring students. Large language models (LLMs) could address this by generating human-like grades and feedback for assignments. However, accounts of their practical application are scarce. We wrote Python code to integrate the Canvas learning management system with an API for GDPR-compliant LLM access. We used this AI system to grade and feedback weekly assignments of 58 graduate students (47 study participants) enrolled in an introductory programming course. LLM-grading was fast, cost efficient, and relatively accurate: GPT-4o and human graders agreed perfectly in 80% of 6345 evaluated student answers. Human and LLM-grades were positively correlated (r=.570-.866 for each assignment) but humans awarded higher grades. Disagreements occurred because human graders overlooked student mistakes, and GPT-4o graded ambiguous cases stricter and misgraded some questions. Importantly, we think most LLM-grading mistakes can be reconciled by avoiding certain tasks and improving grading rubrics. Most students (77%) found the LLM-generated feedback helpful, and some (33%) appreciated the rapid grading. However, multiple students (30%) stated that receiving preliminary LLM-grades that were lower than their actual grades made them anxious or upset. We conclude that LLM-based grading and feedback can, and likely should, be used to optimize teachers’ resources. Canvas users can adopt our pipeline by simply adding their own API keys, Canvas IDs, and assignments: https://github.com/lukekorthals/canvas-llm-integration. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-3-031-99264-3_5 |
| Other links | https://github.com/lukekorthals/canvas-llm-integration https://www.scopus.com/pages/publications/105013027038 |
| Downloads |
978-3-031-99264-3_5
(Final published version)
|
| Permalink to this page | |
