Using illustrations to train an image-free computer vision system to recognize real photos

You’ve likely heard that a picture is worth a thousand words, but can a large language model (LLM) get the picture if it’s never seen images before?