Using depth maps and weight noising to get better character LoRAs

Using depth maps and weight noising to get better character LoRAs

A few weeks ago I introduced a new method for training style LoRAs which has been quite successful. A bunch of folks asked if this would also help with character training. The short answer is yes, but it needed a separate technique on top of the depth stuff. I’ve got something dialed in well enough to share, though it’s still experimental and I want feedback to help find the optimal settings.

The new mechanism is weight noising. It’s a small Gaussian perturbation injected directly into the LoRA weights at each training step. A simple way to think of it is that it helps the model “forget” mistakes during training and only keep things that are consistent in the data. More technically, it biases training toward flatter loss minima and spreads learning across more singular directions of the LoRA factorization (I measured +20% stable rank on the same config without it). The practical effect is that it resists the memorization that usually overcooks character runs, and likeness comes out substantially better at the same step count.

The post image shows an example training on actress Clare Bowen, who has uniquely recognizable features but is not known by Flux. This is using a training set of 8 images, the same training step count (750), and same model. The standard run is in the middle, the new method is on the right.

The settings are identical for both runs except one has weight noise and depth anchoring, along with a different number of repeats for each bucket size:

  • Batch 4, LR 5e-5
  • Image size buckets of 512, 768, 1024
  • LoKr factor 8
  • AdamW8bit, 1200 steps total (but best checkpoint at 750)

The differing number of images per bucket is actually a good training trick on its own, and I updated my trainer to make this easier by allowing you to specify how many repeats of each image per bucket.

Things I’m still working out and would love feedback on:

  1. Optimal sigma across dataset sizes — using 0.00125 has gotten the best results, and I’m pretty sure the right value scales with dataset size and batch size but I haven’t fully mapped it.
  2. Whether weight noising compounds well with other character LoRA tricks people are using.

I’ve also added Docker support so you can more easily run this on Runpod.

Repo: https://github.com/BuffaloBuffaloBuffaloBuffalo/ai-toolkit-perceptual

Finally, the new-job page now has a “Quickstart Template” dropdown at the top that loads the best character config end-to-end. It defaults to the HuggingFace Flux 2 Klein 9B checkpoint but you can also use your own checkpoint. Still plenty of UI cleanup to do on my end, so pardon the mess!

Happy to answer questions and help troubleshoot here or in DMs.

EDIT: One important thing to know about captioning. You will likely get the best results if you use the built-in subject masking feature, which masks out the background. If you use this, it is important that your captions ONLY describe the character, NOT the setting. You may also use just a trigger phrase with subject masking, but your results will be less promptable. I have added quickstart configs for both masked and unmasked.

submitted by /u/QuantumBogoSort
[link] [comments]