Attention to the Branches: A Comparative Analysis of FairMOT with Transformers on Fish Dataset

Open Access
Authors
Publication date 2025
Host editors
  • C. Sombattheera
  • P. Weng
  • J. Pang
Book title Multi-disciplinary Trends in Artificial Intelligence
Book subtitle 17th International Conference, MIWAI 2024, Pattaya, Thailand, November 11–15, 2024 : proceedings
ISBN
  • 9789819606917
ISBN (electronic)
  • 9789819606924
Series Lecture Notes in Computer Science
Volume | Issue number I
Pages (from-to) 64–76
Publisher Singapore: Springer
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The application of Transformers in computer vision has gained momentum, even to the extent of revising the Vision Transformer (ViT) theory of abandoning CNNs, or to be exact CNN backbones for Transformer-based backbones. This research attempts to evaluate the efficiency backbones when incorporated into a re-ID-based model such as FairMOT which is traditionally trained using a CNN. We investigate how Transformer-based feature extraction impacts tracking performance, particularly for small and occluded objects such as fish in video data. Our findings indicate that while ViT backbones offer promising features, they do not yet surpass CNN-based methods in terms of tracking accuracy in regards to the FairMOT approach. This study highlights the need for further optimization of Transformer architectures.

Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-981-96-0692-4_6
Downloads
978-981-96-0692-4_6 (Final published version)
Permalink to this page
Back