We investigate the theoretical foundations of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM (Ho et al., 2020) and DDIM (Song et al., 2021), and neither sampler with CFG generates the gamma-powered distribution p(x|c)^γp(x)^{1−γ}. Then, we clarify the behavior of CFG by showing that it is a kind of predictor-corrector method (Song et al., 2020)…
This paper was accepted at the Mathematics of Modern Machine Learning (M3L) Workshop at NeurIPS 2024. We investigate the unreasonable effectiveness of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In…
I was asked to make a top-level post of my comment in a recent thread about samplers, so here it goes. I had been meaning to write up an up-to-date explanation of the sampler names because you really have to dig to learn all of this, as I've found out.…
Here's a comparison of Nano Banana and various versions of QWEN Image Edit 2509. You may be asking why Nano Banana is missing in some of these comparisons. Well, the answer is BLOCKED CONTENT, BLOCKED CONTENT, and BLOCKED CONTENT. I still feel this is a valid comparison as it really…