Towards language models that benefit us all

A.J. Leidinger

Towards language models that benefit us all Studies on stereotypes, robustness, and values

Authors	A.J. Leidinger
Supervisors	R.A.M. van Rooij
Cosupervisors	E.V. Shutova
Award date	29-09-2025
ISBN	9789464738933
Number of pages	355
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	As Large Language Models have evolved from single-task solvers to general-purpose chat engines, demarcating their capabilities and harms is posing a significant challenge. Systematic investigation of both is needed as the cornerstone to well-informed policy and technological advancement. In this dissertation, we study stereotypes, robustness and values in Large Language Models (LLMs), drawing on insights from search engine studies, linguistics, formal semantics, logic and philosophy. In Part One, we investigate stereotyping harms in Natural Language Processing systems, namely search autocomplete engines and LLMs, finding uneven safety behaviour across a diverse set of social groups in both cases. These findings lead us to investigate variability in LLM behaviour more broadly in Part Two where we study robustness of LLM capabilities across tasks and for reasoning in particular. Based on our findings, we chart a path towards more holistic evaluation practices for the field of Natural Language Processing. In Part Three, we take steps towards aligning LLMs so that they represent a variety of social groups and speakers of different languages. Firstly, we collect and annotate a multilingual dataset to assess LLM agreement with values across languages. Secondly, we develop a direct alignment approach for LLMs to improve the robustness of alignment across demographics and languages.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Towards language models that benefit us all Studies on stereotypes, robustness, and values