I found this project and thought it was a really cool use of ML with a lot of potentials. In the demo, the model generates sound based on the street view, giving the user a full audio visual experience from the purely visual street map data. I thought it really utilized DL to its strength — finding patterns for massive datasets. With only a reasonably sized training set, it can be generalized to the massive street view data we have. The underlying model (if I understood/guessed correctly) is a CNN (which is frozen) that takes in the image to generate an embedding, and another CNN that is trained but discarded at inference which takes in the sound file to generate an embedding as close to the one generated by the image CNN for the paired image. The embedding for each sound file is then saved and used as a key to retrieve sound file at inference time. With more recent developments in massive language models (transformers,) we’ve seen evidence (Jukebox, Visual transformers, etc.) showing that they are more or less task agnostic. Since the author mentioned that the model sometimes lacks a full semantic understanding of the context but simply match sound files to the objects seen in the image, these multi-modal models are a promising way to further improve this model to account for more complex contexts. It might also open up some possibilities of remixing the sound files instead of simply retrieving existing sound data to further improve the user experience. This might also see some exciting uses in the gaming/simulation industry. Windows’ new flight simulator is already taking advantage of DL to generate a 3D model (mainly buildings) of the entire earth from satellite imagery, it’s only reasonable to assume that some day we’ll need audio generation to go along with the 3D asset generation for the technology to be used in more games.


Last year, I chose the Simpsons vs. Family Guy project (which I wrote about here) that I still maintain is the coolest freaking thing ever! But in the interest of variety and learning, of course, I found another project that I really enjoyed, which is Max Braun’s StyleGAN trained on eBoy’s database of pixel art.

Basically, he fed a bunch of images to a machine, then told the machine to make its own of them! Since I’m a sucker for all things pixel art, this really stuck out to me artistically. You can read his documentation here.

I really enjoy old Flash games, and this reminded me of them a lot! I just love the old/2000s pixel art style. If I ever get the chance to make a video game, I want it to look just like this!


The project that caught my attention the most was a real-time SketchRNN project called Scrying Pen by Andy Matuschak on the Chrome Experiments website. The experiment predicts the future strokes of the user as they draw. I have seen a few stroke-based predictive machine learning experiments before, but I this is the most interesting application of it that I’ve seen so far. As I drew, is started to feel less like the algorithm was predicting what my new stroke would be, and more like it was making judgements and suggesting what that stroke should be, and I found myself actually recreating the predictive strokes somewhat, which was somewhat of a strange experience.


starry – LookingOutwards03

I can remember – Guillaume Slizewicz

This project used machine learning to generate text descriptions of a set of images of a certain location (such as the one featured in the video), which was then used to create poems that the machine would write. I liked this project because I felt that aesthetically I was drawn to its presentation, as a lot of the visual imagery in the video (rural/ nature-focused) were ones that I could relate to. I also thought conceptually it was incredibly interesting because of its investigation of personal memory through machines– the poems the machine generated were both from the memories of the machine itself, but it also connected to the artist who selected the photos for the dataset. Watching a machine generate a poem from your own memories makes me wonder if such machines could be classified as extensions of the artist or capable of empathy, or as something separate.


Link   Shape Edition

Draw to Art: Shape Edition


Draw to Art is an experiment by Google that matches people’s doodles to paintings, drawings, and sculptures from museums around the world. I really like it because it is a simple project but useful in that it brings people closer to those artworks that were created years ago. This interactive experiment also brings more attention to the pieces that are sometimes neglected because they are less famous.


AR Copy-Paste by Cyril Diagne


I actually remember seeing this project last year, and I think it’s so genius. Especially because I’m coming from the perspective of a designer, I’m interested in the types of interactions that are made possible with Deep Learning. After doing a little digging on Cyril Diagne, I realized they are also the person behind, which has the same kind of magic. It’s such a simple interaction from the user’s perspective, that is so intuitively what we’d want to do with technology: drag something we see through our camera in the real world, directly into screen world. To me, a project like this seems almost obvious after the fact, but involves noticing of the moments of frustration many (designers, in this case) have with technology but accept and put up with, and taking a step back to question whether there are actually ways around it.


Link to Runway Palette

I found this project interesting because it gathers pieces of information across thousands of digital images and generates a cohesive piece. I also like how it is interactive in that the original images appear when a user clicks on a certain part of the image. I personally would also like to explore ways to receive user input and derive information from such input to generate an interactive work.