Machine Learning II

Entangled II by Scott Eaton.

Performance II: Skin on Repeat, NFT by Chrystal Y. Ding.
Ding is an artist/writer “interested in fluctuations in identity and embodiment”. She “uses machine learning to explore the impact of trauma & future technology on identity.”

StyleGAN2 trained on comics, study by Zach Whelan.


ML Image Synthesis
A Sampler of Diverse uses of ML in Art
  • Human-Robot Interaction
  • Neural Nets in Games
  • Poetry
  • Music
  • Film

Convolution: The Foundation of Convolutional Neural Nets

Convolution is a fancy-sounding name for an simple image-processing operation (such as blurring, sharpening, or edge detection). It underlies all of the recent work in Deep Convolutional Neural Nets and is worth taking a moment to understand. In the simplest terms, a convolution is an operation in which:

  • each given pixel (1×1 gray)
  • in a destination image (cyan)
  • is computed by taking a weighted average of
  • neighborhood of corresponding pixels (3×3 gray)
  • from a source image (blue):

Let’s watch this nice explanation of “Convolution” from 1:12–2:50, and 4:31–6:13:

Here’s an interactive explanation that should make this even more clear:


Convolutional Neural Networks (CNNs) use convolutions at their lowest levels to detect features like points and edges. At the next-higher levels, the results of these detections (“activations”) are grouped in ever-more-complex combinations (textures→patterns→parts→objects). This CNN Explainer does a good job of opening the hood:

Image-to-Text: Bleeding Edge Techniques

For some time, AI researchers have worked on automatic captioning systems, computationally deriving text from images. Here are the results of one system called Neural Storyteller, developed in 2015. Note that the language model used to generate the text for these was trained on romance novels.

But what about the opposite? Is it possible to go from text to images? The Wordseye system allows you to do this (in the browser). It does not use machine learning to generate the images; rather, it combines a large library of readymade 3D models according to the nouns and prepositions in the provided caption.

In 2018, the AttnGAN network was developed. From text descriptions, it could synthesize images of things similar to ones it had been trained on— generally, objects in the ImageNet database.

Things have advanced significantly. Last year, OpenAI released a new model called “Dall-E” that could contruct novel forms in unexpected ways. Here it is synthesizing images of “an armchair in the shape of an avocado”:

Let’s take a moment to experiment with three extremely recent code repositories that build on this work. These have all been developed in the last month or so, and will give us the opportunity to run a project in a Google Colab Notebook, which is how many ML algorithms are developed:

  • DeepDaze (Github; use “Simplified [Colab] Notebook”)
  • BigSleep (Github; use “Simplified [Colab] Notebook” or Colab)
  • Aleph2Image (Colab)

Here is a comparison of how each of these algorithms interpreted the sentence “cute frogs eating pizza” that I provided to them. How would you describe the difference in their results?

DeepDaze (“cute frogs eating pizza”):

BigSleep (“cute frogs eating pizza”):

Aleph2Image (“cute frogs eating pizza”):

Here’s a comparative study Simon Colton made of their output:


Artist Trevor Paglen collaborated with the Kronos Quartet on Sight Machine (2018), a performance in which the musicians were analyzed by (projected) computer vision systems. “Sight Machine’s premise is simple and strange.  The Kronos Quartet performs on stage, and Paglen runs a series of computer vision algorithms on the live video of this performance; behind the musicians, he then projects a video that these algorithms generate.  Paglen is an artist best known for his photography of spy satellites, military drones, and secret intelligence facilities, and in the past couple of years, he has begun exploring the social ramifications of artificial intelligence.”

Critical Appery

White Collar Crime Risk Zones by Sam Lavigne et al. uses machine learning to predict where financial crimes are mostly likely to occur across the US. The project is realized as a website, smartphone app and gallery installation.

Absurd machine vision

Pareidolia, Maria Verstappen & Erwin Driessens, 2019. [VIDEO]
In the artwork Pareidolia, facial detection is applied to grains of sand. A fully automated robot search engine examines the grains of sand in situ. When the machine finds a face in one of the grains, the portrait is photographed and displayed on a large screen.

Text Synthesis and Poetry

Theo Lutz is credited as the author of the first text classifiable as a work of electronic literature. Here is his computer-generated “Stochastische Texte,” 1959:

Theo Lutz, »Stochastische Texte«, 1959, computergeneriertes Gedicht ; Computer: Zuse Z22; Ausgabe: Fernschreiber, ca. 80 x 20 cm

Computer poetry (~1959) preceded computer graphics and visual computer art (~1964) by a few years. The first computers could only manipulate symbols (like letters and numbers). Here’s more early computer poetry by Brion Gysin (1960):

Janelle Shane is particularly well-known for her humorous 2017 project in which she trained a neural network to generate the names and colors of new paints:

The world’s foremost computational poet is probably Allison Parrish.  Here is an astoundingly good video from 2015; let’s watch a few minutes starting from 1:30-4:30.


The Infinite Drum Machine, Manny Tan & Kyle McDonald (2017). “This experiment uses machine learning to organize thousands of everyday sounds. The computer wasn’t given any descriptions or tags – only the audio. Using a technique called t-SNE, the computer placed similar sounds closer together. You can use the map to explore neighborhoods of similar sounds and even make beats using the drum sequencer.”


Ross Goodwin: Sunspring, A Sci-Fi Short Film