Latent Space Editing in Transformer-Based Flow Matching

doi:https://doi.org/10.1609/aaai.v38i3.27998

Latent Space Editing in Transformer-Based Flow Matching

Authors	V.T. Hu W. Zhang M. Tang P. Mettes D. Zhao C. Snoek
Publication date	2024
Host editors	M. Wooldridge J. Dy S. Natarajan
Book title	Proceedings of the 38th AAAI Conference on Artificial Intelligence
Book subtitle	AAAI-2024
ISBN	9781577358879
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024
Volume \| Issue number	3
Pages (from-to)	2247-2255
Publisher	Washington, DC: AAAI Press
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	This paper strives for image editing via generative models. Flow Matching is an emerging generative modeling technique that offers the advantage of simple and efficient training. Simultaneously, a new transformer-based U-ViT has recently been proposed to replace the commonly used UNet for better scalability and performance in generative modeling. Hence, Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling, but their latent structure and editing ability are as of yet unknown. Hence, we adopt this setting and explore how to edit images through latent space manipulation. We introduce an editing space, which we call u-space, that can be manipulated in a controllable, accumulative, and composable manner. Additionally, we propose a tailored sampling solution to enable sampling with the more efficient adaptive step-size ODE solvers. Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts. Our framework is simple and efficient, all while being highly effective at editing images while preserving the essence of the original content. Our code will be publicly available at https://taohu.me/lfm/
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1609/aaai.v38i3.27998
Other links	https://github.com/dongzhuoyao/uspace
Downloads	27998-Article Text-32052-1-2-20240324 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Latent Space Editing in Transformer-Based Flow Matching