The Kitten Effect

One thing I’ve noticed with image-generating algorithms is that the more of something they have to put in an image, the worse it is.

I first noticed this with the kitten-generating variant of StyleGAN, which often does okay on one cat:

The Kitten Effect
alternative for shocked_pikachu.png

but is terrible at a crowd of kittens.

The Kitten Effect

A few years later, Dall-E 2 is a LOT more coherent despite having a larger job to do. But it’s still susceptible to the kitten effect.

One kitten:

The Kitten Effect
“A cute kitten in a basket”

Two kittens:

The Kitten Effect
“Two cute kittens in a basket”

Ten kittens:

The Kitten Effect
“Ten cute kittens in a basket”. Note: the kittens are all happy and doing great because when you are a virtual kitten it doesn’t matter if you have extra earholes in the middle of your face.

A vast herd of cute kittens thundering across the plain:

The Kitten Effect
“A vast herd of cute kittens thundering across the plain” These virtual kittens are also all healthy and happy. The one with extra eyes in its stomach is great at pouncing.

Similarly, there’s a huge difference in quality between a single dog and a herd of them.

The Kitten Effect
A dog running across a field, matte painting
The Kitten Effect
Vast herd of millions of dogs thundering across the plain, matte painting. They may look a bit odd, but they’re all very good dogs.

And it’s not just animals, of course. Here’s the kitten effect played out in lucky charms marshmallows:

A single lucky charms marshmallow piece on a plate:

The Kitten Effect

a single lucky charms marshmallow piece on a plate, labeled with the name of the shape:

The Kitten Effect

A labeled list of lucky charms marshmallows:

The Kitten Effect
The Kitten Effect

It was not obvious to me that this should be so. If it can make one kitten, why can’t it make ten of equal quality?

My theory is this is may actually be not a numbers thing but a size thing. When I asked Dalle-2 to generate a kitten that takes up less of an image, the kitten gets way, way worse.

The Kitten Effect
“A kitten sitting at the far end of a large room”.

It’s not out of pixels to make the detail with – note the sharpness of the wood grain. But at this scale it has run out of something nonetheless.

It does make DALL-E 2’s eye test charts particularly mean.

The Kitten Effect

Bonus post! In which I get DALL-E 2 to look increasingly closely at a giraffe.