Categories: Image

Using depth maps and weight noising to get better character LoRAs

A few weeks ago I introduced a new method for training style LoRAs which has been quite successful. A bunch of folks asked if this would also help with character training. The short answer is yes, but it needed a separate technique on top of the depth stuff. I’ve got something dialed in well enough to share, though it’s still experimental and I want feedback to help find the optimal settings.

The new mechanism is weight noising. It’s a small Gaussian perturbation injected directly into the LoRA weights at each training step. A simple way to think of it is that it helps the model “forget” mistakes during training and only keep things that are consistent in the data. More technically, it biases training toward flatter loss minima and spreads learning across more singular directions of the LoRA factorization (I measured +20% stable rank on the same config without it). The practical effect is that it resists the memorization that usually overcooks character runs, and likeness comes out substantially better at the same step count.

The post image shows an example training on actress Clare Bowen, who has uniquely recognizable features but is not known by Flux. This is using a training set of 8 images, the same training step count (750), and same model. The standard run is in the middle, the new method is on the right.

The settings are identical for both runs except one has weight noise and depth anchoring, along with a different number of repeats for each bucket size:

  • Batch 4, LR 5e-5
  • Image size buckets of 512, 768, 1024
  • LoKr factor 8
  • AdamW8bit, 1200 steps total (but best checkpoint at 750)

The differing number of images per bucket is actually a good training trick on its own, and I updated my trainer to make this easier by allowing you to specify how many repeats of each image per bucket.

Things I’m still working out and would love feedback on:

  1. Optimal sigma across dataset sizes — using 0.00125 has gotten the best results, and I’m pretty sure the right value scales with dataset size and batch size but I haven’t fully mapped it.
  2. Whether weight noising compounds well with other character LoRA tricks people are using.

I’ve also added Docker support so you can more easily run this on Runpod.

Repo: https://github.com/BuffaloBuffaloBuffaloBuffalo/ai-toolkit-perceptual

Finally, the new-job page now has a “Quickstart Template” dropdown at the top that loads the best character config end-to-end. It defaults to the HuggingFace Flux 2 Klein 9B checkpoint but you can also use your own checkpoint. Still plenty of UI cleanup to do on my end, so pardon the mess!

Happy to answer questions and help troubleshoot here or in DMs.

EDIT: One important thing to know about captioning. You will likely get the best results if you use the built-in subject masking feature, which masks out the background. If you use this, it is important that your captions ONLY describe the character, NOT the setting. You may also use just a trigger phrase with subject masking, but your results will be less promptable. I have added quickstart configs for both masked and unmasked.

submitted by /u/QuantumBogoSort
[link] [comments]

AI Generated Robotic Content

Share
Published by
AI Generated Robotic Content
Tags: ai images

Recent Posts

Potentially the most insane LORA you’ll see today – Archer (8 characters + style) Ideogram LORA

Hi, I'm Dever and I like training LORAs, you can download this one from Huggingface…

4 hours ago

Building an End-to-End Sentiment Analysis Pipeline with Scikit-LLM

Traditional machine learning pipelines for predictive tasks like text classification usually rely on extracting structured,…

4 hours ago

Safeguard your agentic AI applications with the Amazon Bedrock Guardrails InvokeGuardrailChecks API

Today, we’re announcing a new API with Amazon Bedrock Guardrails. With this API, you can…

4 hours ago

How Siemens “slices the elephant,” advancing agentic workflows for industrial software development

For technology companies like Siemens, software is the nervous system of factories, energy grids, and…

4 hours ago

Best Handheld Fans and Wearable Fans (2026)

Whether you’re at a festival, tennis match, or wedding, these hand fans and wearable cooling…

5 hours ago

Engineered van der Waals crystal mimics neuronal cells with light-driven learning

A research team led by Professor Taesung Kim of the School of Mechanical Engineering at…

5 hours ago