Information taken from the GitHub page: https://github.com/Stability-AI/stablediffusion/blob/main/doc/UNCLIP.MD
HuggingFace checkpoints and diffusers integration: https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip
Public web-demo: https://clipdrop.co/stable-diffusion-reimagine
unCLIP is the approach behind OpenAI’s DALL·E 2, trained to invert CLIP image embeddings. We finetuned SD 2.1 to accept a CLIP ViT-L/14 image embedding in addition to the text encodings. This means that the model can be used to produce image variations, but can also be combined with a text-to-image embedding prior to yield a full text-to-image model at 768×768 resolution.
If you would like to try a demo of this model on the web, please visit https://clipdrop.co/stable-diffusion-reimagine
This model essentially uses an input image as the ‘prompt’ rather than require a text prompt. It does this by first converting the input image into a ‘CLIP embedding’, and then feeds this into a stable diffusion 2.1-768 model fine-tuned to produce an image from such CLIP embeddings, enabling a users to generate multiple variations of a single image this way. Note that this is distinct from how img2img does it (the structure of the original image is generally not kept).
Blog post: https://stability.ai/blog/stable-diffusion-reimagine
submitted by /u/hardmaru
[link] [comments]
Embeddings — vector-based numerical representations of typically unstructured data like text — have been primarily…
Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they…
This post is co-written with Sunaina Kavi, AI/ML Product Manager at Omada Health. Omada Health,…
Anthropic released Cowork on Monday, a new AI agent capability that extends the power of…
New York governor Kathy Hochul says she will propose a new law allowing limited autonomous…
Artificial intelligence (AI) is increasingly used to analyze medical images, materials data and scientific measurements,…