SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation

Open Access
Authors
Publication date 02-2025
Journal Transactions on Machine Learning Research
Article number 3114
Volume | Issue number 2025
Number of pages 17
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors. Despite considerable progress in removing hand-crafted components and simplifying the architecture with transformers, multi-scale feature maps and pyramid designs remain a key factor for their empirical success. In this paper, we show that shifting the multiscale inductive bias into the attention mechanism can work well, resulting in a plain detector ‘SimPLR’ whose backbone and detection head are both non-hierarchical and operate on single-scale features. We find through our experiments that SimPLR with scale-aware attention is plain and simple architecture, yet competitive with multi-scale vision transformer alternatives. Compared to the multi-scale and single-scale state-of-the-art, our model scales better with bigger capacity (self-supervised) models and more pre-training data, allowing us to report a consistently better accuracy and faster runtime for object detection, instance segmentation, as well as panoptic segmentation.
Document type Article
Language English
Published at https://doi.org/10.48550/arXiv.2310.05920
Published at https://openreview.net/forum?id=6LO1y8ZE0F
Other links https://github.com/kienduynguyen/SimPLR https://jmlr.org/tmlr/papers/index.html
Downloads
3114_SimPLR_A_Simple_and_Plain (Final published version)
Permalink to this page
Back