How Smart is Dall-E 2?

القائمة الرئيسية

الصفحات


Prompt: “Polymer clay dragons eating pizza in a boat”
Computer-generated image (Dall-e 2 by OpenAI) 

For a several years now, computers have been able to generate images based on a natural-language prompt. 

The resulting images have suffered from problems of logic and global coherence.

For example, here's what you get if you give the computer the prompt “A rabbit detective sitting on a park bench and reading a newspaper in a Victorian setting.” (Latent Diffusion LAION-400M via @loretoparisi)

Where are his legs? His hands? Are those books or newspapers? Is that a coffee table in front of his bench? 

The image doesn't make sense, and we might conclude that the problem comes from the computer not having any experience of living in a body or dealing with the real world. No matter how big the data sets, or how many layers of processing you bring to the task, you can't get past that limitation. 

Or can you? 

Open AI is one of the pioneers of generating realistic images and art from descriptions in natural language. They recently unveiled new software called Dall-e 2, which has pushed the boundaries of what's possible with this technology.

Here's what Dall-E 2 does with the same prompt: “A rabbit detective sitting on a park bench and reading a newspaper in a Victorian setting.” 


The overall logic is much better. Now he has legs and is really sitting on that bench, even casting a shadow. But the image is still not perfect. What's the black loop in his left hand? And why doesn't he seem to be holding the newspaper with his right hand? 

Here's one more example of how the technology is improving, using the prompt “teddy bears working on new AI research on the moon in the 1980s” 


The first version using older tech (laion400m) looks like a paste-up of unrelated elements.


Here's what Dall-e 2 came up with: a pretty believable image with consistent lighting. 

Open AI released this YouTube video to introduce the sofware.

This technology scares some working artists and illustrators. @VividVoid says: "DALL-E is breaking my heart. AI art is about to lay utter waste to traditional visual art forms. This will be so much more destructive than what the Internet did to music. It will be a technological conquest of one of the great human avenues of spiritual transformation."

AI skeptic Gary Marcus doubts whether the technology will ever replace artists because it is just crunching big data sets. It's not learning from embodied experience, nor does it understand symbolic or semantic concepts the way a human does. Marcus says: "This whole thread is weaponized cherry-picked PR; the antithesis of science."

Read more
Dall-e 2 at OpenAI
Podcast: Gary Marcus: Toward a Hybrid of Deep Learning and Symbolic AI

تعليقات