This paper was accepted at the Mathematics of Modern Machine Learning (M3L) Workshop at NeurIPS 2024. We investigate the unreasonable effectiveness of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM and DDIM, and neither sampler with CFG generates the gamma-powered distribution. Then, we clarify the behavior of CFG by showing that it is a kind…
We investigate the theoretical foundations of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM (Ho et…
I made up some WAN 2.2 merges with the following goals: WAN 2.2 features (including "high" and "low" models) 1 model Simplicity by including VAE and CLIP Accelerators to allow 4-step, 1 CFG sampling WAN 2.1 lora compatibility ... and I think I got something working kinda nicely. Basically, the…