No Labels? No Problem! Experiments with active learning strategies for multi-class classification in imbalanced low-resource settings

Open Access
Authors
Publication date 2023
Book title Nineteenth International Conference on Artificial Intelligence and Law
Book subtitle Proceedings of the Conference : Braga, Portugal, June 19-23, 2023, Universidade do Minho Law School
ISBN (electronic)
  • 9798400701979
Event 19th International Conference on Artificial Intelligence and Law, ICAIL 2023
Pages (from-to) 277-286
Number of pages 10
Publisher New York, New York: The Association for Computing Machinery
Organisations
  • Faculty of Law (FdR)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
  • Faculty of Law (FdR) - Leibniz Center for Law (FdR)
Abstract

Labeling textual corpora in their entirety is infeasible in most practical situations, yet it is a very common need today in public and private organizations. In contexts with large unlabeled datasets, active learning methods may reduce the manual labeling effort by selecting samples deemed more informative for the learning process. The paper elaborates on a method for multi-class classification based on state-of-the-art NLP active learning techniques, performing various experiments in low-resource and imbalanced settings. In particular, we refer to a dataset of Dutch legal documents constructed with two levels of imbalance; we study the performance of task-adapting a pre-trained Dutch language model, BERTje, and of using active learning to fine-tune the model to the task, testing several selection strategies. We find that, on the constructed datasets, an entropy-based strategy slightly improves the F1, precision, and recall convergence rates; and that the improvements are most pronounced in the severely imbalanced dataset. These results show promise for active learning in low-resource imbalanced domains but also leave space for further improvement.

Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3594536.3595171
Other links https://www.scopus.com/pages/publications/85177818812
Downloads
3594536.3595171 (Final published version)
Permalink to this page
Back