I wanted to share my process, tips and tricks, and encourage you to do the same so you can develop new ideas and share them with the community as well!
I’ve never been an artistic person, so this technology has been a delight, and unlocked a new ability to create engaging stories I never thought I’d be able to have the pleasure of producing and sharing.
Here’s a sampler gallery of consistent images of the same character: https://imgur.com/a/SpfFJAq
Note: I will not post the full story here as it is a steamy romance story and therefore not appropriate for this sub. I will keep guide is SFW only – please do so also in the comments and questions and respect the rules of this subreddit.
- Automatic1111 and baseline comfort with generating images in Stable Diffusion (beginner/advanced beginner)
- Photoshop. No previous experience required! I didn’t have any before starting so you’ll get my total beginner perspective here.
- That’s it! No other fancy tools.
This guide includes full workflows for creating a character, generating images, manipulating images, and getting a final result. It also includes a lot of tips and tricks! Nothing in the guide is particularly over-the-top in terms of effort – I focus on getting a lot of images generated over getting a few perfect images.
First, I’ll share tips for faces, clothing, and environments. Then, I’ll share my general tips, as well as the checkpoints I like to use.
How to generate consistent faces
Tip one: use a TI or LORA.
To create a consistent character, the two primary methods are creating a LORA or a Textual Inversion. I will not go into detail for this process, but instead focus on what you can do to get the most out of an existing Textual Inversion, which is the method I use. This will also be applicable to LORAs. For a guide on creating a Textual Inversion, I recommend BelieveDiffusion’s guide for a straightforward, step-by-step process for generating a new “person” from scratch. See it on Github.
Tip two: Don’t sweat the first generation – fix faces with inpainting.
Very frequently you will generate faces that look totally busted – particularly at “distant” zooms. For example: https://imgur.com/a/B4DRJNP – I like the composition and outfit of this image a lot, but that poor face 🙁
Here’s how you solve that – simply take the image, send it to inpainting, and critically, select “Inpaint Only Masked”. Then, use your TI and a moderately high denoise (~.6) to fix.
Here it is fixed! https://imgur.com/a/eA7fsOZ Looks great! Could use some touch up, but not bad for a two step process.
Tip three: Tune faces in photoshop.
Photoshop gives you a set of tools under “Neural Filters” that make small tweaks easier and faster than reloading into Stable Diffusion. These only work for very small adjustments, but I find they fit into my toolkit nicely. https://imgur.com/a/PIH8s8s
Tip four: add skin texture in photoshop.
A small trick here, but this can be easily done and really sell some images, especially close-ups of faces. I highly recommend following this quick guide to add skin texture to images that feel too smooth and plastic.
How to generate consistent clothing
Clothing is much more difficult because it is a big investment to create a TI or LORA for a single outfit, unless you have a very specific reason. Therefore, this section will focus a lot more on various hacks I have uncovered to get good results.
Tip five: Use a standard “mood” set of terms in your prompt.
Preload every prompt you use with a “standard” set of terms that work for your target output. For photorealistic images, I like to use
highly detailed, photography, RAW, instagram, (imperfect skin, goosebumps:1.1) this set tends to work well with the mood, style, and checkpoints I use. For clothing, this biases the generation space, pushing everything a little closer to each other, which helps with consistency.
Tip six: use long, detailed descriptions.
If you provide a long list of prompt terms for the clothing you are going for, and are consistent with it, you’ll get MUCH more consistent results. I also recommend building this list slowly, one term at a time, to ensure that the model understand the term and actually incorporates it into your generations. For example, instead of using
green dress, use
dark green, (((fashionable))), ((formal dress)), low neckline, thin straps, ((summer dress)), ((satin)), (((Surplice))), sleeveless
Here’s a non-cherry picked look at what that generates. https://imgur.com/a/QpEuEci Already pretty consistent!
Tip seven: Bulk generate and get an idea what your checkpoint is biased towards.
If you are someone agnostic as to what outfit you want to generate, a good place to start is to generate hundreds of images in your chosen scenario and see what the model likes to generate. You’ll get a diverse set of clothes, but you might spot a repeating outfit that you like. Take note of that outfit, and craft your prompts to match it. Because the model is already biased naturally towards that direction, it will be easy to extract that look, especially after applying tip six.
Tip eight: Crappily photoshop the outfit to look more like your target, then inpaint/img2img to clean up your photoshop hatchet job.
I suck at photoshop – but StableDiffusion is there to pick up the slack. Here’s a quick tutorial on changing colors and using the clone stamp, with the SD workflow afterwards
Let’s turn https://imgur.com/a/GZ3DObg into a spaghetti strap dress to be more consistent with our target. All I’ll do is take 30 seconds with the clone stamp tool and clone skin over some, but not all of the strap. Here’s the result. https://imgur.com/a/2tJ7Qqg Real hatchet job, right?
Well let’s have SD fix it for us, and not spend a minute more blending, comping, or learning how to use photoshop well.
Denoise is the key parameter here, we want to use that image we created, keep it as the baseline, then moderate denoise so it doesn’t eliminate the information we’ve provided. Again, .6 is a good starting point. https://imgur.com/a/z4reQ36 – note the inpainting. Also make sure you use “original” for masked content! Here’s the result! https://imgur.com/a/QsISUt2 – First try. This took about 60 seconds total, work and generation, you could do a couple more iterations to really polish it.
This is a very flexible technique! You can add more fabric, remove it, add details, pleats, etc. In the white dress images in my example, I got the relatively consistent flowers by simply crappily photoshopping them onto the dress, then following this process.
This is a pattern you can employ for other purposes: do a busted photoshop job, then leverage SD with “original” on inpaint to fill in the gap. Let’s change the color of the dress:
- Quickselect the dress, no need to even roto it out. https://imgur.com/a/im6SaPO
- Ctrl+J for a new layer
- Hue adjust https://imgur.com/a/FpI5SCP
- Right click the new layer, click “Create clipping mask”
- Go crazy with the sliders https://imgur.com/a/Q0QfTOc
- Let stable diffusion clean up our mess! Same rules as strap removal above. https://imgur.com/a/Z0DWepU
Use this to add sleeves, increase/decrease length, add fringes, pleats, or more. Get creative! And see tip seventeen: squint.
How to generate consistent environments
Tip nine: See tip five above.
Standard mood really helps!
Tip ten: See tip six above.
A detailed prompt really helps!
Tip eleven: See tip seven above.
The model will be biased in one direction or another. Exploit this!
By now you should realize a problem – this is a lot of stuff to cram in one prompt. Here’s the simple solution: generate a whole composition that blocks out your elements and gets them looking mostly right if you squint, then inpaint each thing – outfit, background, face.
Tip twelve: Make a set of background “plate”
Create some scenes and backgrounds without characters in them, then inpaint in your characters in different poses and positions. You can even use img2img and very targeted inpainting to make slight changes to the background plate with very little effort on your part to give a good look.
Tip thirteen: People won’t mind the small inconsistencies.
Don’t sweat the little stuff! Likely people will be focused on your subjects. If your lighting, mood, color palette, and overall photography style is consistent, it is very natural to ignore all the little things. For the sake of time, I allow myself the luxury of many small inconsistencies, and no readers have complained yet! I think they’d rather I focus on releasing more content. However, if you do really want to get things perfect, apply selective inpainting, photobashing, and color shifts followed by img2img in a similar manner as tip eight, and you can really dial in anything to be nearly perfect.
Must-know fundamentals and general tricks:
Tip fourteen: Understand the relationship between denoising and inpainting types.
My favorite baseline parameters for an underlying image that I am inpainting is .6 denoise with “masked only” and “original” as the noise fill. I highly, highly recommend experimenting with these three settings and learning intuitively how changing them will create different outputs.
Tip fifteen: leverage photo collages/photo bashes
Want to add something to an image, or have something that’s a sticking point, like a hand or a foot? Go on google images, find something that is very close to what you want, and crappily photoshop it onto your image. Then, use the inpainting tricks we’ve discussed to bring it all together into a cohesive image. It’s amazing how well this can work!
Tip sixteen: Experiment with controlnet.
I don’t want to do a full controlnet guide, but canny edge maps and depth maps can be very, very helpful when you have an underlying image you want to keep the structure of, but change the style. Check out Aitrepreneur’s many videos on the topic, but know this might take some time to learn properly!
Tip seventeen: SQUINT!
When inpainting or img2img-ing with moderate denoise and original image values, you can apply your own noise layer by squinting at the image and seeing what it looks like. Does squinting and looking at your photo bash produce an image that looks like your target, but blurry? Awesome, you’re on the right track.
Tip eighteen: generate, generate, generate.
Create hundreds – thousands of images, and cherry pick. Simple as that. Use the “extra large” thumbnail mode in file explorer and scroll through your hundreds of images. Take time to learn and understand the bulk generation tools (prompt s/r, prompts from text, etc) to create variations and dynamic changes.
Tip nineteen: Recommended checkpoints.
I like the way Deliberate V2 renders faces and lights portraits. I like the way Cyberrealistic V20 renders interesting and unique positions and scenes. You can find them both on Civitai. What are your favorites? I’m always looking for more.
That’s most of what I’ve learned so far! Feel free to ask any questions in the comments, and make some long form illustrated content yourself and send it to me, I want to see it!