When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy

Jirui Qi; Shan Chen; Zidi Xiong; R. Fernández; Danielle S. Bitterman; Arianna Bisazza

When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy

Authors	Jirui Qi Shan Chen Zidi Xiong R. Fernández Danielle S. Bitterman Arianna Bisazza
Publication date	2025
Host editors	C. Christodoulopoulos T. Chakraborty C. Rose V. Peng
Book title	The 2025 Conference on Empirical Methods in Natural Language Processing : Findings of EMNLP 2025
Book subtitle	EMNLP 2025 : November 4-9, 2025
ISBN (electronic)	9798891763357
Event	30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Pages (from-to)	20279–20296
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Recent Large Reasoning Models (LRMs) with thinking traces have shown strong performance on English reasoning tasks. However, the extent to which LRMs can think in other languages is less studied. This is as important as answer accuracy for real-world applications since users may find the thinking trace useful for oversight only if expressed in their languages. In this work, we comprehensively evaluate two leading families of LRMs on our established benchmark XReasoning. Surprisingly, even the most advanced models often revert to English or produce fragmented reasoning in other languages, revealing a substantial gap in the capability of thinking in non-English languages. Promoting models to reason in the user’s language via prompt hacking enhances readability and oversight. This could gain user trust, but reduces answer accuracy, exposing an important trade-off. We further demonstrate that targeted post-training, even with just 100 instances, can mitigate this language mismatch, although accuracy is still degraded. Our results reveal the limited multilingual reasoning capabilities of current LRMs and suggest directions for future research. All code and datasets are released at https://github.com/Betswish/mCoT-XReasoning.
Document type	Conference contribution
Note	With checklist
Language	English
Published at	https://aclanthology.org/2025.findings-emnlp.1103/ (Final published version)
Other links	https://github.com/Betswish/mCoT-XReasoning
Downloads	2025.findings-emnlp.1103 (Final published version)
Supplementary materials	2025.findings-emnlp.1103.checklist
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy