Stability AI launches SDXL 0.9: A Leap Forward in AI Image Generation

Today, Stability AI announces SDXL 0.9, the most advanced development in the Stable Diffusion text-to-image suite of models. Following the successful release of Stable Diffusion XL beta in April, SDXL 0.9 produces massively improved image and composition detail over its predecessor.

The model can be accessed via ClipDrop today, with API coming shortly. Research weights are now available, with an open release coming mid-July as we move to 1.0.

Despite its ability to be run on a modern consumer GPU, SDXL 0.9 presents a leap in creative use cases for generative AI imagery. The ability to generate hyper-realistic creations for films, television, music, and instructional videos and offer advancements for design and industrial use places SDXL at the forefront of real-world applications for AI imagery.

Examples

Some examples of the prompts tested on SDXL beta (left) and 0.9 show how far this model has come in just two months.

***Prompt:*** *✨aesthetic✨ aliens walk among us in Las Vegas, scratchy found film photograph.*

*(Left – SDXL Beta, Right – SDXL 0.9)*

***Prompt:*** *A wolf in Yosemite National Park, chilly nature documentary film photography.*
***Negative prompt:*** *3d render, smooth, plastic, blurry, grainy, low-resolution, anime, deep-fried, oversaturated.*

*(Left – SDXL Beta, Right – SDXL 0.9)*

***Prompt:*** *~aesthetic~*~ manicured hand holding up a take-out coffee, pastel chilly dawn beach Instagram film photography.
***Negative prompt:*** *3d render, smooth, plastic, blurry, grainy, low-resolution, anime.*

*(Left – SDXL Beta, Right – SDXL 0.9)*

The SDXL series also offers various functionalities extending beyond basic text prompting. These include image-to-image prompting (inputting one image to get variations of that image), inpainting (reconstructing missing parts of an image), and outpainting (constructing a seamless extension of an existing image).

What’s under the hood?

The key driver of this advancement in composition for SDXL 0.9 is its significant increase in parameter count (the sum of all the weights and biases in the neural network that the model is trained on) over the beta version.

SDXL 0.9 has one of the largest parameter counts of any open source image model, boasting a 3.5B parameter base model and a 6.6B parameter model ensemble pipeline (the final output is created by running on two models and aggregating the results). The second stage model of the pipeline is used to add finer details to the generated output of the first stage.

To compare, the beta version runs on 3.1B parameters and uses just a single model.

SDXL 0.9 is run on two CLIP models, including one of the largest OpenCLIP models trained to date (OpenCLIP ViT-G/14), which beefs up 0.9’s processing power and ability to create realistic imagery with greater depth and a higher resolution of 1024×1024.

A research blog going into greater detail about the specifications and testing of this model will be released by the SDXL team shortly.

***Prompt:*** *beautiful scenery nature glass bottle landscape, purple galaxy bottle (SDXL 0.9 – 1024×1024).*

System requirements

Despite its powerful output and advanced model architecture, SDXL 0.9 can be run on a modern consumer GPU, needing only a Windows 10 or 11, or Linux operating system, with 16GB RAM, an Nvidia GeForce RTX 20 graphics card (equivalent or higher standard) equipped with a minimum of 8GB of VRAM. Linux users can also use a compatible AMD card with 16GB VRAM.

Beta launch statistics

Since SDXL’s beta launch on April 13, we’ve had great responses from our Discord community of users, numbering nearly 7,000. These users have generated over 700,000 images, averaging more than 20,000 per day. More than 54,000 images have been entered into Discord community ‘Showdowns’ with 3,521 SDXL images nominated as winners.

***Prompt:*** *magical realism; manicured fingers holding a piece of white heart-shaped sea glass up against the setting sun realistic film photography. (SDXL beta – 480×480)*

Availability

SDXL 0.9 is now available on the Clipdrop by Stability AI platform. Stability AI API and DreamStudio customers will be able to access the model this Monday, 26th June, and other leading image-generating tools like NightCafe.

SDXL 0.9 will be provided for research purposes only during a limited period to collect feedback and fully refine the model before its general open release. The code to run it will be publicly available on GitHub.

If researchers would like to access these models, please apply using the following link: SDXL-0.9-Base model and SDXL-0.9-Refiner. Please log in to your HuggingFace Account with your academic email to request access. Kindly remember that currently, SDXL 0.9 is exclusively intended for research purposes.