DALL-E2: A Look Into How Creative It Can Be

DALL-E 2 is an image generation model that just requires a textual prompt to generate an image that accurately represents the semantic meaning behind that text. And obviously, the internet has gone crazy with the model, generating images from creative to funny to even more creative and funnier.

Content

Intro

DALL-E 2: A Quick Introduction

How does it work?

Possible Use Cases

Implementations

Limitations

Do It Yourself

Conclusion

Intro

DALL-E 2 was recently announced a couple of months back in the month of April and since then it has taken the internet by a storm. And no not for the right reasons. Most of the virality that the transformers based generative model has gotten are from the memes trending on the internet.

Fun Fact: Ever wondered how Voldemort would look like in the toilet. Well, you can now through the latest developments in the Generative AI field. Scroll to the end to know more.

And as always is the case with these SOTA AI models, DALL-E 2 has also again surfaced the biases that come inbuilt and some of the other problems too such as the misuse of this technology through deep fakes and generation of content that is highly objectionable and outright illegal.

Through this blog, we will take a look at what DALL-E 2 is and how it works, some of the memes and artistic renderings that were generated using it, and finally the consequences of such technology when open-sourced and made available to the people.

Please don’t consider this as an explanatory blog. We have tried to explain DALL-E 2 a little bit, but there are a ton of other resources that have done a much better job at providing explanations, than we ever could’ve done. Our intention was just to show capabilities and limitations that come with DALL-E 2 (and possibly share some of the memes in the process too). Enjoy!

DALL-E 2: A Quick Introduction

The first version of DALL-E was a 12-billion parameter image generation model derived from GPT-3, a transformers based text generative model, both developed by a same entity called Open-AI. DALL-E was trained to generate images just from an input text.

When “an armchair in the shape of an avocado” was passed to DALL-E, the image shown below was generated:

“An armchair in the shape of an avocado” one of the famous images generated using DALL-E. Credit: OpenAI

Similar to GPT-3, DALL-E was also a transformers based language model. But while training, the input given to DALL-E was both text and image as a single stream of data containing up to 1280 tokens, and was trained using maximum likelihood to generate all of the tokens, one after another.

DALL-E 2 takes the performance even further, with higher resolution, greater comprehension and new capabilities like inpainting and text diff. Surprisingly, this second version uses much less parameters (just 3 billion) than the first version did. Still, it produces some of the best computer generated images ever produced by an algorithm.

How does it work?

DALL-E 2 basically has three important parts to it:

1.

CLIP generates a representational embedding space showing the correspondence between an image and its caption text. These embeddings are learned by the CLIP model when it is trained on millions of pairs of image-captions. The CLIP model is then frozen (training stopped).

2.

Prior diffusion models take the CLIP text encoding from an input text and maps them to a corresponding CLIP image encoding that can be further used as a guide to generate images.

3.

GLIDE decoders take these image encoding to generate entirely new set of possible images from a representation space (using reverse-Diffusion). This generated image tries to show the semantic information behind the input text given to DALL-E 2.

There are many other details to the DALL-E 2 model that are covered beautifully over here.

Possible Use Cases

Models such as DALL-E 2 and also Google’s Imagen, has produced images that a human mind will find quite hard to decide whether it is computer generated or not. Just typing in the type of photo that is required in text will get you the most perfect photo with the required subject, lighting, time of the day, angle, etc. within minutes if not seconds.

Hence, people have raised concerns over their possible use in the stock photography market and how it will tremendously affect the companies and their network of millions of photographers, as there remains practically no need for them.

Other than that such models can also improve the quality of low res images. They can also be used to generate things that don’t even exist like an iPhone in the 19th century or a Tesla car in the sixties. It has been said since centuries that the human imagination has no limits, but models such as DALL-E 2 have started to challenge that saying by a lot. If you don’t believe us, lets go through some of the recent feats that DALL-E 2 has achieved.

Implementations

Artistic renderings

Let’s start with the artistic renderings.

Mural created with DALL-E 2 using the inpainting technique. Credit: David Schnurr

The technique called inpainting was used in the above image, in which we delete a certain section of an image and prompt the model to fill in that space with some other object. Using this method the creator was able to generate the above image.

“Middle Earth”. Credit: Tython

The above concept arts were generated by just giving the text “middle earth” (or something like that) to the model. The model generated these sceneries just through its inner understanding that how would something called middle earth might look like.

Futiristic NYC. Credit: Dalle Pictures

Similar to the last one, this image shows the NYC in the future.

Creative (Humurous)

We have included some of the creative ones in this section.

Starting with how Minions might steal the moon.

“Minions trying their best to steal the moon”. Credit: Dalle2Pics

If you love the Minions (who doesn’t), there’s an entire twitter thread with minions x DALL-E 2 available here:

Takes you to a twitter thread.

This is how a 1700s royal family frog would look like.

“A photorealistic 1700’s portrait of a frog wearing a pearl necklace and pearl earrings with a black robe and a platinum plated crown with diamonds in a old wooden picture frame hanging above a fireplace, high quality”, Credit: Nodexdev

A stylish crocodile owning the look:

“A crocodile wearing a plaid flannel shirt and sunglasses, standing in front of an old BMW”. Credit: https://twitter.com/dalle_ideas

Even wondered how ballerina astronauts would dance on the moon, wait no longer:

“Multiple astronaut’s wearing a tutu dancing like a ballerina on the moon, digital art”. Credit: Nodexdev

Also, how would Super Mario look in real life? Like this:

“Super Mario getting his citizenship at Ellis Island”. Credit: theshamanshadow

Now, the below one is just too much. Let’s end this!

“Medical illustration of a burrito”. Credit: Mark Rich

How are these even possible?

Let’s look some of the most mind blowing ones that we found, below:

The below one shows the Marble statue of a Supreme Leader Octopus! I mean first of all its an octopus, who is a supreme leader, who also has a marble statue built? Wow! Truly amazing! DALL-E 2 did manage this feet and that too quite efficiently.

“A marble statue of Supreme Leader octopus”. Credit: Hippocampus-garden

The supreme leader above might need a kingdom for himself and that’s why DALL-E 2 created the image below. It somehow understands imagination in a way. Like the photo below is not just DALL-E taking some octopi and randomly placing them to create some image. You can see and feel the story within this image, which is absolutely mind blowing.

“A kingdom of octopi under the sea, digital art”. Credit: Hippocampus-garden

This one was also quite interesting as DALL-E 2 here brilliantly manages to combine the renaissance period and the modern geek period with such authenticity which will make Da Vinci question that when did he draw this.

“Early drawings of R2D2 by Leonardo Da Vinci”. Credit: Hippocampus-garden

The below should be checked out by the people who love psychotic visuals. It is a zoom out video for how would Michaelangelo’s most famous paintings would look when extended. This one is truly spectacular!

(Note: DALL-E 2 doesn’t generate videos.)

Limitations

Although, the ability that DALL-E 2 provides is unimaginable (like literally), it also has certain limitations that present themselves after the fun is over. You must have already witnessed one of the limitations, that was when the model tried generating text within its images.

Also, it still lacks the understanding for positioning and the relation between different objects. The below example shows how just changing the position of the words (fire and truck) changes the generated images so drastically. Not a problem here, but there are some cases where the context of the words matter a lot.

Credit: Evan Morikawa

Along with that, the biases that have always plagued all the ML models are also consistent with DALL-E 2. Here, it can be seen that if the text provided was “CEO” the images on the left were generated, whereas for “Nurse” the images on the right, clearly showing a presence of gender bias within the data set.

CEOs and Nurses, showing gender bias. Credit: OpenAI GitHub

In the images below, a western bias can also be noticed when “Dining” and “wedding” were passed to the model and these images were generated.

Dining and Weddings showing western bias. Credit: OpenAI GitHub

Various steps have been taken by OpenAI to limit and eradicate such biases. The use or access to DALL-E 2 is highly restricted and opened for only some people on an invitation basis. They had also restricted the use of “certain” images during pre-training that can generate images that might not be appropriate.

In a very recent update OpenAI has almost managed to remove the racial and gender based biases by generating images of people that belonged to multiple ethnicities and gender, and the model embeddings also learning that nurses can be males and CEOs can be females too.

Before and after mitigation, images for software engineers. Source: OpenAI

Read more about is here: Reducing Bias and Improving Safety in DALL·E 2 (openai.com)

But, despite the problems, the advances that we as humans have made can’t be ignored. The Machine Learning field has certainly evolved a lot, but still has a lot more “opportunities” to grow even further.

Do it yourself

For us mortals, the DALL-E 2 is available in a smaller package, with faster speeds but less generated image quality. Try the DALL-E 2 mini, created by one of the guys who worked on the DALL-E 2 itself, on any one of the below sites:

Some of things we tried generating using DALL-E 2 mini:

Okay, we went a little too far here, but you now get the idea of what DALL-E 2 is capable of.

Conclusion

We looked at what is DALL-E 2 and how it works to generate images from the text prompted to it. We also looked at some of the images generated by it. The limitations of DALL-E 2 were also explored to get a complete understanding on its capabilities.

We saw how human ingenuity and creativity has allowed us to make computers dream and imagine. If we think about it we humans are not the only ones now that can draw and that too so highly imaginatively.

Models such as DALL-E 2 have attempted and beaten around most of the human population with its creativity right now. What will happen in the future still remains unknown and sky is definitely not the limit for this one. It’s just the beginning.

Who are We?

We are ANAI, an end-to-end AI Platform ecosystem with a purpose of “Democratizing AI”. Below are some of the key features that we provide are-100 plus Data Connectors, Automated Feature Engineering, 300 plus unique ML Models, E2E MLOps and 25 plus RAI/XAI Models. And we are also a fastly growing open source community.

Follow us for more such important insights and discussions into the latest trends of the AI world. Share within your network if you find this blog informative and fun to read.

Do clap to show us appreciation!

--

--

Revca - Helping Companies achieve AI completeness
Revca - Helping Companies achieve AI completeness

Written by Revca - Helping Companies achieve AI completeness

An US based AI startup empowering businesses to become AI-first faster. Check out our products: ANAI (anai.io), Apture (apture.ai), AquaML (aquaml.io)