05-VQGAN+CLIP – Interactivity and Computation • 60-212

bumble_b-VQGAN+CLIP

I feel really stupid for how long I spent trying to get Colab to work. If I wasn’t making this assignment up a few weeks later, I would’ve just headed over to office hours, but since I’m so late already, I just used Pixray instead.

My prompt was: “friendly elves frolicking in a meadow, strawberry picking with baskets, with a giant rainbow in the sky”

It was fun seeing the photo slowly develop through iterations. Here are some of the first ones:

Compared to the end product, it’s so much different! Anyway, I really love the picture, especially how cute the elves are! I just kind of wish the rainbow was there!

hunan-VQGAN+CLIP

Since I don’t have Colab Pro, I spent a lot of time trying to get the notebook working on a local runtime so I can get the bigger models working. But many library issues and differences between Windows and Linux made it very annoying. So I resorted to using a smaller model on Colab.

Lemongrass under the pink sky rendered in blender

duq-VQGAN+CLIP

I spent quite a while trying to get the VQGAN+CLIP site to work, but I was completely unsuccessful. I instead used the Pixray readymade. I was surprised by how unrelated to the prompt the first couple of images were and then I found it to be a window into the way in which ML works to see how it takes that image and slowly makes it fit closer and closer to the prompt.

Koke_Cacao-VQGAN+CLIP

(05-VQGAN+CLIP)

Using: wikiart_16384 + ViT-B/32 + default paremeter
Prompt: A student suffering from his coding homework

The image quality is not very good as other synthesizers based on image inputs (compared to models related to style
transfer) since the natural language processing pipeline restricted the latent space or because they are not trained
end-to-end. The style of the images generated can get very cliche very soon and the tool doesn’t give the artists
very much control over the generated image.

Wormtilda-VQGAN+CLIP

The prompts I gave for these two images, respectively, are “worms in Parliament” and “worm lawyer in worm court.”

kong-VQGAN+CLIP

Textual Prompt: joyful beans

Textual Prompt: alice in the wonderland

It was interesting to see how both of my trials created a part of animal-like features (cat feet, fox?) when my prompts didn’t include direct relationships to animals. The resulting images were definitely different from my initial expectations, which I believe would require an in-depth understanding of the software to achieve. I like how the images contain brushstroke features, but maybe it was due to the images not being fully processed.

qazxsw-VQGAN+CLIP

Prompt: “A Pingpong learning to ski”

I can see the background is like a mix of ski trail and ping-pong table. I guess the creature on the right is the skiing ping-pong, but I’m not sure why it looks furry.

SPINGBING-VQGAN+CLIP

I used the Pixray Readymade tool because I was using colab for another task and it would not let me do them simultaneously. I typed in “nuclear mutated flower field sunny day” and this is what happened:

It is a little too saturated and primary for my personal taste, but I am impressed by the interpretation of the prompt.

Solar-VQGAN+CLIP

(dreamscape of dystopia in apocalypse)

(the Song Dynasty in the style of an impressionist painting)

It was very intriguing to see the program generate these versions step by step. However, by the 400 iteration, the images started becoming more and more similar where I saw little variation. Hence, I played with multiple text prompts to explore the many surreal and imaginative images that could be created.

CrispySalmon-VQGAN+CLIP

Prompt: Rick and Mort messing in the white house.

I actually like the image produced around the 50th iteration the most. It resembles the color/feeling of a Rick and Mort scene the most. I am mostly interested in seeing if it will successfully recreate the style of Rick and Morty. It failed in the sense that it’s becoming more and more realistic, and loosing the sense of color and flatness.