Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

Z. Li; J. Kiseleva; M. de Rijke

doi:https://doi.org/10.18653/v1/2020.findings-emnlp.316

Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

Authors	Z. Li J. Kiseleva M. de Rijke
Publication date	2020
Host editors	T. Cohn Y. He Y. Liu
Book title	Findings of the Association for Computational Linguistics. Findings of ACL: EMNLP 2020
Book subtitle	16-20 November, 2020
ISBN (electronic)	9781952148903
Event	2020 Conference on Empirical Methods in Natural Language Processing
Pages (from-to)	3537–3546
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Dialogue policy learning for task-oriented dialogue systems has enjoyed great progress recently mostly through employing reinforcement learning methods. However, these approaches have become very sophisticated. It is time to re-evaluate it. Are we really making progress developing dialogue agents only based on reinforcement learning? We demonstrate how (1) traditional supervised learning together with (2) a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art reinforcement learning-based methods. First, we introduce a simple dialogue action decoder to predict the appropriate actions. Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance. Finally, we employ the Gumbel-Softmax estimator to alternatively train the dialogue agent and the dialogue reward model without using reinforcement learning. Based on our extensive experimentation, we can conclude the proposed methods can achieve more stable and higher performance with fewer efforts, such as the domain knowledge required to design a user simulator and the intractable parameter tuning in reinforcement learning. Our main goal is not to beat RL with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
Document type	Conference contribution
Note	Volume comprises papers selected from those submitted to EMNLP 2020 which were not selected to appear at the main conference.
Language	English
Published at	https://doi.org/10.18653/v1/2020.findings-emnlp.316
Downloads	2020.findings-emnlp.316 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems