I was asked to make a top-level post of my comment in a recent thread about samplers, so here it goes. I had been meaning to write up an up-to-date explanation of the sampler names because you really have to dig to learn all of this, as I’ve found out. Any corrections or clarifications welcome!
It is easy. You just chip away the noise that doesn’t look like a waifu.
– Attributed to Michelangelo, but almost certainly apocryphal, paraphrased
Perfection is achieved, not when there is no more noise to add, but when there is no noise left to take away.
– Antoine de Saint-Exupéry, paraphrased
So first a very short note on how the UNet part of SD works (let’s ignore CLIP and VAEs and embeddings and all that for now). It is a large artificial neural network trained by showing it images with successively more and more noise applied, until it got good at predicting what any image would look like when noise is added to it. Not very useful in itself, but critically you can also run the algorithm “in reverse”: give it an image of pure noise and it will “denoise” it and give you back an image that would, if noise was again applied to it, yield a similar noise pattern.
All the samplers are different algorithms for numerically approximating solutions to differential equations (DEs). In SD’s case this is a high-dimensional differential equation that determines how the initial noise must be diffused (spread around the image) to produce a result image that minimizes a loss function (essentially the distance to a hypothetical “perfect” match to the initial noise, but with additional “push” applied by the prompt). This incredibly complex differential equation is basically what’s encoded in the billion+ floating-point numbers that make up a Stable Diffusion model.
A sampler essentially works by taking the given number of steps, and on each step, well, sampling the latent space to compute the local gradient (“slope”), to figure out which direction the next step should be taken in. Like a ball rolling down a hill, the sampler tries to get as “low” as possible in terms of minimizing the loss function. But what locally looks like the fastest route may not actually net you an optimal solution – you may get stuck in a local optimum (a “valley”) and sometimes you have to first go up to find a better route down! (Also, rather than a simple 2D terrain, you have a space of literally thousands of dimensions to work with, so the problem is “slightly” more difficult!)
The OG method for solving DEs, discovered by Leonhard Euler in the 1700s. Very simple and fast to compute but accrues error quickly unless a large number of steps (=small step size) is used. Nevertheless, and sort of surprisingly, works well with SD, where the objective is not to approximate an actual existing solution but find something that’s locally optimal.
An improvement over Euler’s method, named after Karl Heun, that uses a correction step to reduce error and is thus an example of a predictor–corrector algorithm. Roughly twice as slow than Euler, not really worth using IME.
A Linear Multi-Step method. An improvement over Euler’s method that uses several prior steps, not just one, to predict the next sample.
Apparently a “Pseudo-Numerical methods for Diffusion Models” (PNDM) version of LMS.
Denoising Diffusion Implicit Models. One of the “original” samplers that came with Stable Diffusion. Requires a large number of steps compared to more recent samplers.
Diffusion Probabilistic Model solver. An algorithm specifically designed for solving diffusion differential equations, published in Jun 2022 by Cheng Lu et al.
An improved version of DPM, by the same authors, that improves results at high guidance (CFG) values if I understand correctly.
DPM++ 2M and 2S
Variants of DPM++ that use second-order derivatives. Slower but more accurate. S means single-step, M means multi-step. DPM++ 2M (Karras) is probably one of the best samplers at the moment when it comes to speed and quality.
A variant of DPM++ that uses third-order derivatives. Multi-step. Presumably even slower, even more accurate.
Unified Predictor–Corrector Framework by Wenliang Zhao et al. Quick to converge, seems to yield good results. Apparently the “corrector” (UniC) part could be used with any other sampler type as well. Not sure if anyone has tried to implement that yet.
A novel sampler algorithm by Yilun Xu et al. Apparently works by making several “restarts” by periodically adding noise between the normal noise reduction steps. Claimed by the authors to combine the advantages of both deterministic and stochastic samplers, namely speed and not getting stuck at local optima, respectively.
Any sampler with “Karras” in the name
A variant that uses a different noise schedule empirically found by Tero Karras et al. A noise schedule is essentially a curve that determines how large each diffusion step is – ie. how exactly to divide the continuous “time” variable into discrete steps. In general it works well to take large steps at first and small steps at the end. The Karras schedule is a slight modification to the standard schedule that empirically seems to work better.
Any sampler with “Exponential” in the name
Presumably uses a schedule based on the linked paper, Fast Sampling of Diffusion Models with Exponential Integrator by Zhang and Cheng.
Any sampler with “a” in the name
An “ancestral” variant of the solver. My understanding here is really weak, but apparently these use probability distributions and “chains” of conditional probabilities, where, for example, given P(a), P(b|a), and P(c|b), then a and b are “ancestors” of c. These are inherently stochastic (ie. random) and don’t converge to a single solution as the number of steps grows. The results are also usually quite different from the non-ancestral counterpart, often regarded as more “creative”.
Any sampler with SDE in the name
A variant that uses a Stochastic Differential Equation, a DE where at least one term is a stochastic process. In short, introduces some random “drift” to the process on each step to possibly find a route to a better solution than a fully deterministic solver. Like the ancestral samplers, doesn’t necessarily converge on a single solution as the number of steps grows.
Stable Diffusion Samplers: A Comprehensive Guide (stable-diffusion-art.com)
Choosing a sampler for Stable Diffusion (mccormickml.com)
What are all the different samplers (github.com)