Scaling Backwards: Minimal Synthetic Pre-training?

R. Nakamura; R. Tadokoro; R. Yamada; Y.M. Asano; I. Laina; C. Rupprecht; N. Inoue; R. Yokota; H. Kataoka

doi:https://doi.org/10.1007/978-3-031-72633-0_9

Scaling Backwards: Minimal Synthetic Pre-training?

Authors	R. Nakamura R. Tadokoro R. Yamada Y.M. Asano I. Laina C. Rupprecht N. Inoue R. Yokota H. Kataoka
Publication date	2025
Host editors	A. Leonardis E. Ricci S. Roth O. Russakovsky T. Sattler G. Varol
Book title	Computer Vision – ECCV 2024
Book subtitle	18th European Conference, Milan, Italy, September 29–October 4, 2024 : proceedings
ISBN	9783031726323
ISBN (electronic)	9783031726330
Series	Lecture Notes in Computer Science
Event	The 18th European Conference on Computer Vision ECCV 2024
Volume \| Issue number	XV
Pages (from-to)	153–171
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Pre-training and transfer learning are an important building block of current computer vision systems. While pre-training is usually performed on large real-world image datasets, in this paper we ask whether this is truly necessary. To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k. We construct such a dataset from a single fractal with perturbations. With this, we contribute three main findings. (i) We show that pre-training is effective even with minimal synthetic images, with performance on par with large-scale pre-training datasets like ImageNet-1k for full fine-tuning. (ii) We investigate the single parameter with which we construct artificial categories for our dataset. We find that while the shape differences can be indistinguishable to humans, they are crucial for obtaining strong performances. (iii) Finally, we investigate the minimal requirements for successful pre-training. Surprisingly, we find that a substantial reduction of synthetic images from 1k to 1 can even lead to an increase in pre-training performance, a motivation to further investigate “scaling backwards”. Finally, we extend our method from synthetic images to real images to see if a single real image can show similar pre-training effect through shape augmentation. We find that the use of grayscale images and affine transformations allows even real images to “scale backwards”. The code is available at https://github.com/SUPER-TADORY/1p-frac.
Document type	Conference contribution
Note	With supplementary material
Language	English
Published at	https://doi.org/10.1007/978-3-031-72633-0_9
Other links	https://github.com/SUPER-TADORY/1p-frac
Downloads	978-3-031-72633-0_9 (Final published version)
Supplementary materials	560932_1_En_9_MOESM1_ESM
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Scaling Backwards: Minimal Synthetic Pre-training?