Categories: Image

Stability AI launches SDXL 0.9: A Leap Forward in AI Image Generation

Today, Stability AI announces SDXL 0.9, the most advanced development in the Stable Diffusion text-to-image suite of models. Following the successful release of Stable Diffusion XL beta in April, SDXL 0.9 produces massively improved image and composition detail over its predecessor. 

The model can be accessed via ClipDrop today with API coming shortly. Research weights are now available with an open release coming in mid-July as we move to 1.0.

Despite its ability to be run on a modern consumer GPU, SDXL 0.9 presents a leap in creative use cases for generative AI imagery. The ability to generate hyper-realistic creations for films, television, music, and instructional videos, as well as offering advancements for design and industrial use, places SDXL at the forefront of real world applications for AI imagery. 

Examples

Some examples of the prompts tested on both SDXL beta (left) and 0.9 show how far this model has come in just two months.

Prompt: ✨aesthetic✨ aliens walk among us in Las Vegas, scratchy found film photograph

(Left – SDXL Beta, Right – SDXL 0.9)

Prompt: A wolf in Yosemite National Park, chilly nature documentary film photography
Negative prompt: 3d render, smooth, plastic, blurry, grainy, low-resolution, anime, deep-fried, oversaturated

(Left – SDXL Beta, Right – SDXL 0.9)

Prompt: *~aesthetic~*~ manicured hand holding up a take-out coffee, pastel chilly dawn beach instagram film photography
Negative prompt: 3d render, smooth, plastic, blurry, grainy, low-resolution, anime

(Left – SDXL Beta, Right – SDXL 0.9)

The SDXL series also offers a range of functionalities that extend beyond basic text prompting. These include image-to-image prompting (inputting one image to get variations of that image), inpainting (reconstructing missing parts of an image), and outpainting (constructing a seamless extension of an existing image). 

What’s under the hood?

The key driver of this advancement in composition for SDXL 0.9 is its significant increase in parameter count (the sum of all the weights and biases in the neural network that the model is trained on) over the beta version. 

SDXL 0.9 has one of the largest parameter counts of any open source image model, boasting a 3.5B parameter base model and a 6.6B parameter model ensemble pipeline (the final output is created by running on two models and aggregating the results). The second stage model of the pipeline is used to add finer details to the generated output of the first stage.

To compare, the beta version runs on 3.1B parameters and uses just a single model.

SDXL 0.9 is run on two CLIP models, including one of the largest OpenCLIP models trained to date (OpenCLIP ViT-G/14), which beefs up 0.9’s processing power and ability to create realistic imagery with greater depth and a higher resolution of 1024×1024.  

A research blog going into greater detail about the specifications and testing of this model will be released by the SDXL team shortly.

Prompt: beautiful scenery nature glass bottle landscape, purple galaxy bottle (SDXL 0.9 – 1024×1024)

System requirements 

Despite its powerful output and advanced model architecture, SDXL 0.9 is able to be run on a modern consumer GPU, needing only a Windows 10 or 11, or Linux operating system, with 16GB RAM, an Nvidia GeForce RTX 20 graphics card (equivalent or higher standard) equipped with a minimum of 8GB of VRAM. Linux users are also able to use a compatible AMD card with 16GB VRAM. 

Beta launch statistics

Since SDXL’s beta launch on April 13, we’ve had great responses from our Discord community of users numbering nearly 7,000. These users have generated more than 700,000 images, averaging more than 20,000 per day. More than 54,000 images have been entered into Discord community ‘Showdowns’ with 3,521 SDXL images nominated as winners. 

Prompt: magical realism; manicured fingers holding a piece of white heart-shaped sea glass up against the setting sun realistic film photography (SDXL beta – 480×480)

Availability

SDXL 0.9 is now available on the Clipdrop by Stability AI platform. Stability AI API and DreamStudio customers will be able to access the model this Monday, 26th June as well as other leading image generating tools like NightCafe

SDXL 0.9 will be provided for research purposes only during a limited period to collect feedback and fully refine the model before its general open release. The code to run it will be publicly available on Github.

If researchers would like to access these models, please apply using the following link: SDXL-0.9-Base model, and SDXL-0.9-Refiner. Please log in to your HuggingFace Account with your academic email to request access. Kindly remember that currently, SDXL 0.9 is exclusively intended for research purposes.

What’s next?

SDXL 0.9 will be followed by the full open release of SDXL 1.0 targeted for mid-July (timing TBC).

License

SDXL0.9 is released under a non-commercial, research-only license and is subject to its terms of use.

Contact

For further information or to provide feedback on SDXL 0.9, we welcome you to contact us at research@stability.ai.

AI Generated Robotic Content

Share
Published by
AI Generated Robotic Content
Tags: ai images

Recent Posts

10 Podcasts That Every Machine Learning Enthusiast Should Subscribe To

Podcasts are a fun and easy way to learn about machine learning.

7 hours ago

o1’s Thoughts on LNMs and LMMs

TL;DR We asked o1 to share its thoughts on our recent LNM/LMM post. https://www.artificial-intelligence.show/the-ai-podcast/o1s-thoughts-on-lnms-and-lmms What…

7 hours ago

Leading Federal IT Innovation

Palantir and Grafana Labs’ Strategic PartnershipIntroductionIn today’s rapidly evolving technological landscape, government agencies face the…

7 hours ago

How Amazon trains sequential ensemble models at scale with Amazon SageMaker Pipelines

Amazon SageMaker Pipelines includes features that allow you to streamline and automate machine learning (ML)…

7 hours ago

Orchestrating GPU-based distributed training workloads on AI Hypercomputer

When it comes to AI, large language models (LLMs) and machine learning (ML) are taking…

7 hours ago

Cohere’s smallest, fastest R-series model excels at RAG, reasoning in 23 languages

Cohere's Command R7B uses RAG, features a context length of 128K, supports 23 languages and…

8 hours ago