RecA: A new finetuning method that doesn’t use image captions.

RecA: A new finetuning method that doesn’t use image captions.

https://arxiv.org/abs/2509.07295

“We introduce Reconstruction Alignment (RecA), a resource-efficient post-training method that leverages visual understanding encoder embeddings as dense “text prompts,” providing rich supervision without captions. Concretely, RecA conditions a UMM on its own visual understanding embeddings and optimizes it to reconstruct the input image with a self-supervised reconstruction loss, thereby realigning understanding and generation.”

https://huggingface.co/sanaka87/BAGEL-RecA

submitted by /u/Total-Resort-3120
[link] [comments]