Time Does Tell: Self-Supervised <i>Time-Tuning</i> of Dense Image Representations

M. Salehi; E. Gavves; C.G.M. Snoek; Y.M. Asano

doi:https://doi.org/10.1109/ICCV51070.2023.01516

Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

Authors	M. Salehi E. Gavves C.G.M. Snoek Y.M. Asano
Publication date	2023
Book title	2023 IEEE/CVF International Conference on Computer Vision
Book subtitle	ICCV 2023 : Paris, France, 2-6 October 2023 : proceedings
ISBN	9798350307191
ISBN (electronic)	9798350307184
Event	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Pages (from-to)	16490-16501
Publisher	Los Alamitos, California: IEEE Computer Society
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos, this information-rich source has been largely overlooked. Our paper aims to address this gap by proposing a novel approach that incorporates temporal consistency in dense self-supervised learning. While methods designed solely for images face difficulties in achieving even the same performance on videos, our method improves not only the representation quality for videos – but also images. Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos. This effectively facilitates the transfer of high-level information from videos to image representations. Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images. We believe this method paves the way for further self-supervised scaling by leveraging the abundant availability of videos. The implementation can be found here : https://github.com/SMSD75/Timetuning
Document type	Conference contribution
Note	With supplemental file. - Longer version available on ArXiv.
Language	English
Published at	https://doi.org/10.1109/ICCV51070.2023.01516 (Final published version) https://doi.org/10.48550/arXiv.2308.11796 (Other version)
Published at	https://openaccess.thecvf.com/content/ICCV2023/html/Salehi_Time_Does_Tell_Self-Supervised_Time-Tuning_of_Dense_Image_Representations_ICCV_2023_paper.html (Accepted author manuscript)
Other links	https://github.com/SMSD75/Timetuning https://www.proceedings.com/72328.html
Downloads	Salehi_Time_Does_Tell_Self-Supervised_Time-Tuning_of_Dense_Image_Representations_ICCV_2023_paper (Accepted author manuscript) Time_Does_Tell_Self-Supervised_Time-Tuning_of_Dense_Image_Representations (Final published version) 2308.11796 (Other version)
Supplementary materials	Salehi_Time_Does_Tell_ICCV_2023_supplemental
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations