My second project is centered around kinect with the addition of an audio reactive element.
Based on dpkinect2 and the kinect demo patch, I created a point cloud with data from kinect. A gesture detection was implemented without machine learning: when the person claps their hands (the distance between their hands is smaller than a certain threshold), the point cloud would change its representation (draw_mode) e.g. from points to lines to polygons. To define this gesture, simply set a threshold for distance is not enough, because there should be only one change in representation activated when it is smaller than the threshold, instead of switching multiple times during one clap. Therefore, I incorporated a timer as followed which detects the time between the initial clap and the separation of hands, so that one clap will only trigger one bang.
In addition, the x, y coordinates of the left and right hands changes the color of the point cloud. Initially, I also tried to adjust the camera position to get multiple angles of the point cloud. However, I figured that there will always be a “point of origin” which looked fine when draw_mode is points but turned out deformed when draw_mode is lines or polygon as the point cloud drifted away. Unfortunately, after experimenting for a couple days, I still could not find a way around it and decided to center the point cloud.
As for background, I created “treadmill-ish” shapes using poly~. They start from the center of the screen and move with increasing z values which looks like the shapes are coming out of the screen. This way, I could make the point cloud of the person look like it’s moving forward. This poly~ object consists of 20 individual shapes each staggered by a small amount with z values scaled to go between -100 and 10 and wrap around to make them look continuous.
The audio-reactive element is a beat (or bass) detection. The audiobang patch sends a bang when it detects a low frequency, and the bang then triggers an individual shape which, similar to the background, starts from the center and picks a random x direction and comes out of the screen.
Here is a short demo of my roommate dancing to Sleep Deprivation by Simian Mobile Disco, enjoy 🙂
For this project, I wanted to try something interactive beyond mouse control, so I decided to incorporate data from Leap Motion sensor into a vocoder.
The super handy patch shown above is made by Jules Françoise. Leap Motion sensor can detect data including coordinates of each finger, coordinates of palm, rotation of palm, etc.
The Leap Motion data processing patch is as followed:
I took the y-coordinates (heights) of both index fingers directly, and calculated the 3-D distance between index and middle finger to define a gesture so that when you only have your index finger pointed out, it only encodes the pitch, whereas if you give a high-five, the left hand creates a delay and the right hand encodes feedback intensity.
Here is the main audio processing patch. I combined the vocoder patch from Delicious Max Tutorial, which allows two tones to be defined by two fingers, and the delay+feedback patch we made in class, and had them receive the scaled-down data from the leap motion patch.
The pfft patch is pretty straight-forward:
Here are two demo videos with different types of source audio, enjoy!
For this assignment I decided to experiment with the mousestate project. Based on a meshy patch we made in class and the freeze frame patch from Jean-Francois Charles’ tutorial, I embedded two pfft~ subpatches.
The first one called “hw4-pfft” is in charge of creating input matrix for the mesh object and a single float value from amplitude bin for color shifting. As shown below, the value is scaled to 0.0 – 1.0 before entering jit.gl.gridshape, and the specific range could be changed depending on the amplitudes of different audio inputs. Note that the two other float values (saturation and grayscale) in the message box can also be altered based on changing values like the first one, but since rapidly changing saturation or grayscale don’t really create drastic effects, I decided to hardcode them after experimenting with various values.
I also incorporated the mousestate object to track the speed of mouse movement, which decides how often the audio “freezes”. The last two outlets of mousestate indicate the mouse’s horizontal and vertical movement during the 50 ms interval. The faster it moves, the more dramatic the freezing effect.
This is the first audio input, Sapokanikan by Joanna Newsom:
Output: (Mouse movements are easier to see in full-screen mode)
The second input audio is a woman whispering in Swedish saying “never tell this secret to anyone else, not your best friend, not your parents” that I found on freesound.org:
Since the amplitude is relatively constant, the color of the mesh object does not change much, but the freezing effect is especially interesting with (creepy) speeches:
For this assignment, I combined motion detection with audio time shift and feedback, so that the faster I move (the greater the change in luminance), the stronger the feedback.
Here is a screen recording of me waving my hand quickly at the camera and the feedback getting weirder and weirder with my movements. I have tried many screen recording softwares, but unfortunately none of them could capture the sound accurately enough :(. Hopefully this video can at least show the general idea of this project:
My roommate and I bought an elliptical trainer about a year ago, and as its parts get old, now it always makes obnoxious noises every time we use it. Here is an audio footage of my roommate using the elliptical trainer in the background and me watching an Oreo tasting video on YouTube:
For this assignment, I thought it would be interesting to see how well the noise can be reduced using Audacity, the audio editing software that I am most familiar with. By performing noise reduction over and over again, it was exciting for me to explore how the correction of irrelevant “background” signals can actually turn the foreground into noise.
First, I had to sample the noise just by itself, as shown in the following snippet:
Then, I performed noise reduction on the entire recording using the default parameters in Audacity (Sensitivity = 6.00, Frequency Smoothing (bands) = 3):
Here is what it sounds like after 1 noise reduction:
After 5 iterations:
After 10 iterations:
After 25 iterations:
After 40 iterations:
After 50 iterations:
Throughout 50 iterations, I could see from the waveform that the amplitudes are reduced, and the few seconds of “pure noise” at the beginning is almost completely erased. After the 50th iteration, the lady’s words are basically incoherent.
I then tried the same process with another set of parameters (Sensitivity = 8.00, Frequency Smoothing (bands) = 5).
After 10 iterations:
After 25 iterations:
After 40 iterations:
This time, the audio is destroyed even more quickly.