Audio Decomposition(Final Documentation)

This project was such a great learning experience for me even though it didn’t turn out how I expected. I made two master max patches; one involving all of the steps for receiving and sending information to the serial port of the arduino and one for transforming information received into a legible signal full of various oscillators.  The idea remained the same from the initial writeup of identifying key parts of the voice and decomposing them into various signals able to be sent out to a synthesizer. 

 The initial idea – have the ability to talk to or sing with the synth to control its output. I wanted to do this using CV or control voltage. By sending the Arp 2600 (synthesizer) a signal between zero and five volts the synth can respond almost immediately to whatever input I send that to.

The first thing is translating information from my microphone into digital information that can be further processed. I decided to go with using the teachable machine route and training a model on my voice to give interactivity but also a classifier to go by. However, teachable machine cannot be loaded into max given some package compatibility issues. So, with Golan’s help I was able to take my teachable machine and load it into a p5 sketch, then send the information (classifier and confidence level of the sound) to a bridge (node.js) over osc to receive all of this information and convert it in any way I find fit using max msp.

In my first patch, the idea was to send out a series of notes, similar to a sequencer once a sound was identified, but I only got so far as to send out a different sound depending on the confidence of the sound as a result of the arduino having issues receiving data. 

But first the signals have to be converted to a format that the synth can read.  The synth can receive digital signals that aren’t cv data, but they can’t control the arp in the same way. I wanted to use a combination of both to have a sort of consistency in the way they worked together. Kind of like one of the oscillators being part of the voice signal and the rest of the quality information being used to control the pitch or envelope of the sound. The littlebits have a cv bit that allow me to simply send the littlebits arduino a signal between 0 and 255 and then convert those numbers to cv data.

The edited patch instead of going to the arduino is sent to various preset oscillators and changes sound depending on the classifier it is given. 

Both patches together are shown below

Below I have some video documentation of how things work:

https://vimeo.com/1040094594

https://vimeo.com/1040096752

https://vimeo.com/1040484008

Final Project — Fragmented Moments

Concept

At the beginning of the semester, I was more interested in the capturing system or more of the aesthetic of the final output. In these final projects, I grew more interested to the subject of what I was capturing.

During the Person-in-Time project, I wanted to capture the “moments” when I am with my college friends and represent it in a form that is memorable for me to reflect back on later.

My first iteration of this project was called, “Cooking in Concert”, where I recorded us cooking food together. Having lived with each other for two years and cooking countless meals together, my roommates and I naturally formulated cooking stations and drills. Through this project, I wanted to capture our working patterns and interaction within such a small closed space in an abstract representation.

After this project, I realized I wanted to integrate more spatial elements in the representation. Although I personally preferred this abstract visualization because I wanted to focus on our moving patterns and interactions, I received feedback that it felt too distant from my intention of capturing a wholesome moment with friends.

Therefore in my second iteration, I wanted to create a capture that closely replicates my memory of the moment. After our critique, I thought more about asking the question of what brings people together. And does this reason shape how you interact with each other?

Outcome

(I was absent during the actual event, so there is no other documentation.)

I really liked this point-cloud visual because it only had points facing the kinect camera. From the back, it looks like empty shells, which goes inline with how the light from the tv only lights up the front side of our bodies. I also liked how fragmented and shattered it looks because it represents how my memories also bound to come apart and have missing pieces here and there. In this iteration my focus was not necessarily on the movement but to capture the essence of the moment to observe us when we are all observing something else. To take note of some details or interaction that I do not remember because my memory is only a puzzle piece of the whole picture. This project got me to think about how people who share the moment have different perspectives of that same moment.

Capture “system”

With the Azure Kinect camera I spatially captured my interaction with my friends when we regularly meet every Saturday 7pm at our friend’s apartment to watch k-dramas together. The setting is very particular because we watch in the living room on the tv we hooked up the computer to and all four of us sit in a row on the futon while snacking.

I also set up a supplementary go-pro camera to time lapse our screening, in case the Kinect recording failed. Thankfully, it did not. Lastly, I also recorded our conversations during the screening on my computer. Then, as I explain later on, I use runwayml to transcribe the audio recording and motion tracking to integrate the dialogue component to the capture.

Challenges +Updates from last post

I was able to build the application for transformation_example with command lines on terminal, but even after running it, it did not properly output anything …

Therefore, I tried to use Touch Designer instead. Touch Designer worked great with live data, as my demo shows.

However, it can not read the depth data in the mkv files. From searching up similar cases to mine, only the Kinect Viewer can interpret the depth and RGB data so I would need to export it as a .ply before bringing it into  Touch Designer or other programs.

I have tried other suggested methods like mkvtoolnix, etc, but they did not seem to work great either.

Therefore, I decided to focus more on integrating the dialogues into the scenes. I used runwayml to transcribe the audio to get the rough timeline.

Then, I edited parts that were interpreted incorrectly. Lastly, I used its motion tracking to attach the speech to each person.

 

 

Again, Final Product

Final Delivery | Walk on the earth

Concept

I seek to pixelate a flat, two-dimensional image in TouchDesigner and imbue it with three-dimensional depth. My inquiry begins with a simple question: how can I breathe spatial life into a static photograph?

The answer lies in crafting a depth map—a blueprint of the image’s spatial structure. By assigning each pixel a Z-axis offset proportional to its distance from the viewer, I can orchestrate a visual symphony where pixels farther from the camera drift deeper into the frame, creating a dynamic and evocative illusion of dimensionality.

Outcome

Capture System

To align with my concept, I decided to capture a bird’s-eye view. This top-down perspective aligns with my vision, as it allows pixel movement to be restricted downward based on their distance from the camera. To achieve this, I used a 360° camera mounted on a selfie stick. On a sunny afternoon, I walked around my campus, holding the camera aloft. While the process drew some attention, it yielded the ideal footage for my project.

Challenges

Generating depth maps from 360° panoramic images proved to be a significant challenge. My initial plan was to use a stereo camera to capture left and right channel images, then apply OpenCV’s matrix algorithms to extract depth information from the stereo pair. However, when I fed the 360° panoramic images into OpenCV, the heavy distortion at the edges caused the computation to break down.

Moreover, using OpenCV to extract depth maps posed another inherent issue: the generated depth maps did not align perfectly with either the left or right channel color images, potentially causing inaccuracies in subsequent color-depth mapping in TouchDesigner.

Fortunately, I discovered a pre-trained AI model online Image Depth Map that could directly convert photos into depth maps and provided a JavaScript API. Since my source material was a video file, I developed the following workflow:

  1. Extract frames from the video at 24 frames per second (fps).
  2. Batch processes 3000 images through the Depth AI model to generate corresponding depth maps.
  3. Reassemble the depth map sequence into a depth video at 24 fps.

This workflow enabled me to produce a depth video precisely aligned with the original color video.

Design

The next step was to integrate the depth video with the color video in TouchDesigner and enhance the sense of spatial motion along the Z-axis. I scaled both the original video and depth video to a resolution of 300×300. Using the depth map, I extracted the color channel values of each pixel, which represented the distance of each point from the camera. These values were mapped to the corresponding pixels in the color video, enabling them to move along the Z-axis. Pixels closer to the camera moved less, while those farther away moved more.

The interaction between particles and music is controlled in Real-Time

Observing how the 360° camera captured the Earth’s curvature, I had an idea: Could I make it so viewers could “touch” the Earth depicted in the video? To realize this, I integrated MediaPipe’s hand-tracking feature. In the final TouchDesigner setup, the inputs—audio stream, video stream, depth map stream, and real-time hand capture. The final result is an interactive “Earth” that moves to the rhythm of music. The interaction between particles and music is controlled in real time by the user’s beats.

Critical Thinking

  1. Depth map generation was a key step in the entire project, thanks to the trained AI model that overcame the limitations of traditional computer vision methods.
  2. I feel like the videos shot with the 360° camera are interesting in themselves, especially the selfie stick that formed a support that was always close to the lens in the frame, which was very realistic and accurately reflected in the depth map.
  3. Although I considered using a drone to shoot a bird’s-eye view, the 360° camera allowed me to realize the interactive ideas in my design. Overall, the combination of tools and creativity provided inspiration for further artistic exploration.