ChatGPT is not a good indigenous translator

Open Access
Authors
Publication date 2023
Host editors
  • M. Mager
  • A. Ebrahimi
  • A. Oncevay
  • E. Rice
  • S. Rijhwani
  • A. Palmer
  • K. Kann
Book title Third Workshop on Natural Language Processing for Indigenous Languages of the Americas
Book subtitle Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP) : July 14, 2023
ISBN (electronic)
  • 9781959429913
Event 3rd Workshop on Natural Language Processing for Indigenous Languages of the Americas
Pages (from-to) 163–167
Number of pages 5
Publisher Stroudsburg, PA: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
This report investigates the continuous challenges of Machine Translation (MT) systems on indigenous and extremely low-resource language pairs. Despite the notable achievements of Large Language Models (LLMs) that excel in various tasks, their applicability to low-resource languages remains questionable. In this study, we leveraged the AmericasNLP competition to evaluate the translation performance of different systems for Spanish to 11 indigenous languages from South America. Our team, LTLAmsterdam, submitted a total of four systems including GPT-4, a bilingual model, fine-tuned M2M100, and a combination of fine-tuned M2M100 with kNN-MT. We found that even large language models like GPT-4 are not well-suited for extremely low-resource languages. Our results suggest that fine-tuning M2M100 models can offer significantly better performance for extremely low-resource translation.
Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2023.americasnlp-1.17
Downloads
2023.americasnlp-1.17 (Final published version)
Permalink to this page
Back