A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues

Open Access
Authors
Publication date 2021
Host editors
  • C. Zong
  • F. Xia
  • W. Li
  • R. Navigli
Book title The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
Book subtitle ACL-IJCNLP 2021 : proceedings of the conference : August 1-6, 2021
ISBN (electronic)
  • 9781954085527
Event The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
Volume | Issue number 1
Pages (from-to) 5612–5623
Publisher Stroudsburg, PA: The Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Conversational dialogue systems (CDSs) are hard to evaluate due to the complexity of natural language. Automatic evaluation of dialogues often shows insufficient correlation with human judgements. Human evaluation is reliable but labor-intensive. We introduce a human-machine collaborative framework, HMCEval, that can guarantee reliability of the evaluation outcomes with reduced human effort. HMCEval casts dialogue evaluation as a sample assignment problem, where we need to decide to assign a sample to a human or a machine for evaluation. HMCEval includes a model confidence estimation module to estimate the confidence of the predicted sample assignment, and a human effort estimation module to estimate the human effort should the sample be assigned to human evaluation, as well as a sample assignment execution module that finds the optimum assignment solution based on the estimated confidence and effort. We assess the performance of HMCEval on the task of evaluating malevolence in dialogues. The experimental results show that HMCEval achieves around 99% evaluation accuracy with half of the human effort spared, showing that HMCEval provides reliable evaluation outcomes while reducing human effort by a large amount.
Document type Conference contribution
Note With supplementary video
Language English
Published at https://doi.org/10.18653/v1/2021.acl-long.436
Downloads
2021.acl-long.436 (Final published version)
Supplementary materials
Permalink to this page
Back