For this project, Alec and I decided to merge our efforts of audiovisualization and pitch correction. We realized that the pitch correction system Alec created could be used to pinpoint a user’s pitch and use that value to manipulate variables within the visualization, so we decided to make a Rock Band-style game where the player sings along to none other than Twinkle Twinkle Little Star.
My contribution to the project was primarily on the visual end. I took in the variables that Alec’s patcher gave me and represented them using jit.gl.sketch and jit.gl.text within a js object. In addition to the point cloud that expands whenever the player sings, I modified the particle system to change the hue of the particles in correspondence with the note sung by the player. At the bottom of the screen, I added a player cursor – which has its y-position determined by the note sung by the player and a fixed-length tail that shows the past sung notes – and a scrolling bar of upcoming notes in the song. I then added a score counter and a method of state-switching between gameplay and game over screens.
This Drive folder has my contributions, including all of the javascript class files,
and this Drive folder hold all the files for our project as a whole.
Here’s a gist for the visualization patcher, although it won’t be of much use without the js files:
For project 2 I wanted to take advantage of the Media Lab’s 8-channel sound system to create an immersive experience for a listener. Using the HOA Library, I generate lissajous patterns in sonic space as well as allow a user to control the exact placement of one of the three sounds.
In order to emphasize the movement of the sounds, I also have uplighting for the loudspeakers, where each sound corresponds to either red, green, or blue and the moving average amplitude of a signal coming out of a speaker dictates the color values for the lights.
The sounds that are played include sounds made from granular synthesis using parameters based on an accelerometer sent through a Raspberry Pi as well as other effects applied to audio files controlled by a Seaboard (done by Ramin Akhavijou and Rob Keller).
This Google Drive folder includes all of the Max Patches for our project.
https://drive.google.com/open?id=1WZH1nr-ARBmZOF9gPrks3_Oh1Q5mJTS8
The top-level patch that incorporates everything with a (somewhat organized) presentation view is main.maxpat. Most of my work (aside from putting together main.maxpat) is in ambisonics.maxpat, which in turn has several subpatches and sub-subpatches and so on. ambisonics.maxpat is what receives sounds, sound position coordinates, and outputs data to speakers and lights. poly voicecontrol is where the positioning of an individual sound is handled. placement.maxpat calculates the position for a sound (using coord.maxpat) and spatialize.maxpat contains calls to the HOA Library to calculate the signals that should come out of each speaker channel. These are sent to poly light to calculate the light channel value and write it into the appropriate cell of a global matrix. The global matrix is accessed in ambisonics.maxpat to the lights.
In this project, I used an accelerometer to get triple axis data in order to control different parameters in music and light. For receiving the data from the accelerometer, I connected it to the Raspberry Pi (The python code for Pi and accelerometer is written hereunder). After getting data from the microcontroller, I sent the data to my computer using wifi. In order to do that, I added some python codes which connects the Pi and computer to the same network and port. Next, I converted the received data to another format which is readable for Max by using itoa. Then, I used “fromsymbole object” in order to convert the symbol to numeric data. By using unpack, I was able to get xyz data from the accelerometer to my computer. Moreover, I helped in some parts of the music patch to have an acceptable sound which interacts with light as well.
Here is the python code for accelerometer:
import time
import board
import busio
import adafruit_mma8451
Initialize I2C bus.
i2c = busio.I2C(board.SCL, board.SDA)
Initialize MMA8451 module.
sensor = adafruit_mma8451.MMA8451(i2c)
Optionally change the address if it’s not the default:
Main loop to print the acceleration and orientation every second.
while True:
x, y, z = sensor.acceleration
print(‘Acceleration: x={0:0.3f}m/s^2 y={1:0.3f}m/s^2 z={2:0.3f}m/s^2’.format(x, y, z))
orientation = sensor.orientation
For Project 2 I wanted to use the Kinect in some way and expand upon my previous project of controlling sound and visuals. My original goal was to have motion generating sound with a basic human outline on the screen and including lighting from the overhead lights in the classroom. However, I ended up changing my idea, and got rid of the lighting idea, since I wanted to learn more about creating cool visuals in Max, getting inspired by some really aesthetic visualizers on YouTube.
For the project, I used the dp.kinect2 object, along with the starter patch shared in class for the Kinect to help with getting the Kinect data. I wanted to learn more about particle systems since the visuals created with them always look super cool, so I added a hand controlled particle system as a visual, with some help from a YouTube tutorial. At this point everything visual was very fluid, so I wanted something stationary in the image as well, so I used jit.gl.mesh and jit.gl.gridshape to create a stationary shape that makes it seem like you’re in a spiral.
For the sounds, I wanted them to be simple, to contrast the visuals, but also controllable. I ended up having both hands controlling the frequencies of two different sounds, each going to a different channel. I mapped the coordinates of the hands to reasonable frequencies, and fiddled around in order to have controlling the pitch on each hand be pretty reasonable. I played around with using the head to create a tremolo effect, but I didn’t like the sound created, so I scrapped it.
Having done this, I wanted to add more to the visuals, so I had the colors of the particle system and the color of the shape change with the sound. I had different components of the sound controlling the RGB values of the particle system, and had the same components plus the position of the head control the color of the shape.
Hello! For our final project, we created an installation using the Media Lab’s 8 speakers, a raspberry Pi, an accelerometer, DMX floorlights, and a ROLI Seaboard. The idea of our project was to play audio around the room on the speakers, and use lights to cue the listener on where the audio is actually coming from. We used 3 distinct “voices” which could be any audio file. These voices rotate around the room in a lissajous pattern. The position of the voices and additional audio-synthesis can be controlled with the ROLI seaboard and an accelerometer that’s been hooked up to a raspberry pi. As a group member, I assisted with lighting, and helped to get the raspberry pi operational. My main role was creating a max patch that incoorporated the accelerometer data into our project’s audio synthesis and provided control signals to the lights and the speakers. The patch implements granular synthesis, altered playback speed, downsampling, spectral delay, and band filtering to create a unique slew of sounds that will ultimately be played on our 8 speakers. This project was an excellent exercise in debugging, control flow, and sound synthesis. Below is a demonstration of our system in action. You should be able to hear 3 unique “voices”, one of which is controlled by the raspberry PI:
Below is a video of me detailing my max patch extensively, and showing how to use it (save for the subpatches which can be viewed in the attached zip file at the bottom, also, audio is included in the video):
Below is a gist of the main patch:
Finally, I’ve attached a zip file with all the files you’ll need to use the patch:
My second project is centered around kinect with the addition of an audio reactive element.
Based on dpkinect2 and the kinect demo patch, I created a point cloud with data from kinect. A gesture detection was implemented without machine learning: when the person claps their hands (the distance between their hands is smaller than a certain threshold), the point cloud would change its representation (draw_mode) e.g. from points to lines to polygons. To define this gesture, simply set a threshold for distance is not enough, because there should be only one change in representation activated when it is smaller than the threshold, instead of switching multiple times during one clap. Therefore, I incorporated a timer as followed which detects the time between the initial clap and the separation of hands, so that one clap will only trigger one bang.
In addition, the x, y coordinates of the left and right hands changes the color of the point cloud. Initially, I also tried to adjust the camera position to get multiple angles of the point cloud. However, I figured that there will always be a “point of origin” which looked fine when draw_mode is points but turned out deformed when draw_mode is lines or polygon as the point cloud drifted away. Unfortunately, after experimenting for a couple days, I still could not find a way around it and decided to center the point cloud.
As for background, I created “treadmill-ish” shapes using poly~. They start from the center of the screen and move with increasing z values which looks like the shapes are coming out of the screen. This way, I could make the point cloud of the person look like it’s moving forward. This poly~ object consists of 20 individual shapes each staggered by a small amount with z values scaled to go between -100 and 10 and wrap around to make them look continuous.
The audio-reactive element is a beat (or bass) detection. The audiobang patch sends a bang when it detects a low frequency, and the bang then triggers an individual shape which, similar to the background, starts from the center and picks a random x direction and comes out of the screen.
Here is a short demo of my roommate dancing to Sleep Deprivation by Simian Mobile Disco, enjoy 🙂
For my second project, I created a game heavily inspired by super hexagon. This game, which I call “Max Hexagon”, bases all of its randomization on aspects of a given sound file.
In the game, the player is a cursor on one side of a spinning hexagon. As the board spins, so does the player. The player can move left and right about the hexagon, and must dodge the incoming formations for as long as possible. By default, the entire board spins at 33.3RPM, the angular speed of a 12′ record. As the player adjusts his movement, the song begins to play faster/slower, based entirely on the players angular speed in proportion to the speed of a record.
The stage itself is generated in a number of ways. Aspects of the songs FFT are used to create pseudo-random shapes and rotations, while the note-rate of the song is used to determine game speed. In addition, visual effects are created from the music. The maximum value of the FFT is used to create the color, the integral of a frame of the song is used to determine how large the center hexagon is, and the beat of the song is used to change the background pattern. Beat detection is both inaccurate and computationally intense, which is why it does not play a larger role in the game.
The game itself was created using Python and Tkinter. The script that runs the game is multi-threaded, to allow both Tkinter and an OSC server to run in parallel. The OSC server changes specific variables to allow python and max to communicate. The general form is either Python sends a message to Max, which is enacted on immediately, or Python requests new data from max, which is promptly sent over OSC.
The game itself is extremely computationally intense, and must be run in a 1920×1080 resolution. It is, unfortunately, difficult for the game to keep up with Max while running other tasks on the hardware. If the game crashes due to insufficient hardware, the framerate can be changed in constants.py and the tickrate (which is the framerate in milliseconds) can be changed in max.
Beat~ is a soft requirement, it can be removed if necessary and most of the game will still function, baring a visual effect. Beat~ requires 32-bit Max, and will thus not run in Max 8.
My project can be downloaded here: https://drive.google.com/open?id=1gsfdUkBEIh-JGZjIu0__g8CKnZvktxMx
Python spews a number of errors on closing the program. This is normal behavior, and due to the lack of an ability to properly close the OSC server with the rest of the game.
I began project 2 by wanting to look at the connection between visuals and sound — specifically in terms of themes and colors. My first concept was to use an API to get keywords out of an image and play a corresponding audio file that would be altered based on the specifics of the image. My coding knowledge and experience made this extremely difficult so I went a path that was more in scope for me. The project I have ended up with is the ColorSynth. The inspiration for the ColorSynth came from Sean Scully’s “Landline” which is essentially taking images and boiling them down to fewer than 20 pixels tall and one pixel wide and painting the resulting color bands with acrylic on aluminum. I took this simple idea of boiling down picture (whether it be static or motion) to a few stripes and playing it. There are many directions that this concept could have gone and this is one of them. In this iteration of the ColorSynth, there are 3 modes: Swatch (or Color Picker), Stripes, and Camera. The most simple — Swatch — allows you to select a color. The synthesizer will then mix between the three sources: red, green, and blue. Each of the sounds associated with these colors are meant to “feel” similar to that color. There is also a delay effect unit included that can be manually controlled when in the swatch mode. When switched to Stipes mode, the camera appears on the screen, but in the stripes aforementioned. By changing the speed, the synth will scroll through each individual stripe with some slide effecting the amplitude of each color and the effect section. If “Force Manual” is on, then the effects unit will ignore incoming information and be just like Swatch mode. Finally, there is Camera mode which is similar to Stripes, except that we now see the entire camera and the synth information scrolls horizontally and vertically based on the speed. If there is too much gain coming from the Synth, the output will clip and be lowered. If it is lowered too much, reset the gain with the button. You can also manually change the camera dimensions.
The reference works I found prior to this project were quite complicated and difficult to understand, as many of them used techniques that would require more than a semester to learn.
Taking advantage of the nature of self-directed projects, I set one of the main goals of this project as discovering and implementing the effects I personally find captivating. While project 1 (improved documentation – please take a look!) was successful in that I was able to learn the basic usage of “jit.gen,” it did have a somewhat limited range of visuals that were definitely different, but similar to my reference work. Therefore, for the final project, I felt the need and desire to experiment with various techniques I have not used prior to this project and to make take a more personal approach.
Several of the reference works I found appealing were created by Paul Fennell, so I decided to reach out to him. He was kind enough to send me patches that would benefit my project more than the old ones I found on his YouTube channel, and he even introduced me to some of his recent works created using Fragment:Flow, an audio-visual performance and video processing environment he developed! It was very exciting to hear back from an artist who creates works I find beautiful.
Since the focus of my piece is almost entirely on the visual elements, unlike his audio-visual works, I studied the ways in which I could make visual manipulations using some of his techniques. Paul noted that “small, sub decimal changes in value/ranges are best,” and indeed, the performance decreased as soon as I put two-digit negative values. However, I wanted to use specific values for dramatic contrasts in order to amplify the strangeness that the piece has. Since my main goal was to create obscure visual manipulations, I chose to go with values I wanted, regardless of the decrease in performance. The speeds and durations were edited using Adobe Premiere Pro CC so that the lag does not remain an issue. I also added a high values for the “flow” for abrupt waves of distortions and transitions.
However, I did not edit any of the visual elements such as colors, contrasts, and brightness, using the video editing application. For the effects that highlight the visual forms, I experimented with various objects such as jit.sobel, jit.robcross, and jit.brcosa. It was exciting to see that”brcosa” was also used by Paul in his patch :
In short, the jit.sobel object offers two gradient edge detection implementations, Sobel and Prewitt. These edge detectors use 3×3 convolution kernels to compute a 2-dimensional spatial gradient of an incoming matrix, brightening features with “high spatial frequency” — a large amount of change from cell to cell — and darkening features with less change.
However, I personally preferred using jit.robcross object which implements the Robert’s Cross method of edge detection, as it allowed me to easily eliminate details I found distracting and unnecessary, such as the outlines of my irises. This edge detector has similar components as Sobel and Prewitt, but uses a 2×2 convolution kernel, which means the matrix is not perfectly aligned with the target pixel, unlike kernels with two odd dimensions, such as those of Sobel and Prewitt. More information on kernel parameters can be found here.
As I usually do for works that focus and rely heavily on subtle movement changes, I eliminated colors. In this case, jit.rgb2luma was used to convert the 4-plane char ARGB matrix into a 1-plane char monochrome matrix. I played around with ARGB scaling attributes composed of a-scale, r-scale, g-scale, and b-scale, to find what seemed fitting visually. My preferred setting was a negative b-scale value and a g-scale value much greater than a and r, which stayed below the value of 1. The textures of the background were subtle yet maintained, while highlighting the outlines of my body, inevitably directing the focus to the small movements that would have been overlooked otherwise.
I have been told that many of my works are not so “friendly” without no one definite answer or interpretation. As a result, I have naturally become interested in the autonomy of my art. For this particular project, I wanted to create a work that could not only function in, but also benefit from my absence. It aims to represent the common nonverbal communication of an eye contact, while shaping the lengthy reciprocal gaze into an experience that is slightly uncomfortable and odd through unusual speeds, unexpected visual manipulations, and most importantly, the removal of eyes.
For my and Dan’s project, we wanted to do something with the Kinect. In particular, we wanted to be able to play a video game with sensor data from the Kinect. However, when we ran into issues with this, we decided we would create an instrument that used Kinect sensor data.
My contribution was the sound synthesis element.
Our setup was as follows– a Kinect connected to a Windows machine with a license for dp.kinect2 sends sensor data through OSC. We use multiple ports for efficiency and simplicity. On my personal laptop, I read this sensor data and perform sound synthesis.
The first step in my sound synthesis was list parsing. For each body part I read (head, left hand, right hand, left foot, right foot), I am given a list representing X, Y, Z, and certainty. I wanted to use the distances between the body parts to create my instrument, so then I made a subpatch to calculate that euclidian distance.
Once I did that, we had to do some manual testing to see the range of distance values that were possible (i.e. standing in front of the Kinect and doing a starfish pose, to get the widest distance between the body parts). Once we did that, we could scale the potential distances (0 to X, X being the max) into usable numbers for sound synthesis.
I wanted the distance from the hand to feet to correspond to pitches of two separate oscillators (left and right), and the distance from the hands to the head to correspond to the loudness of each oscillator. To make the patch more usable, I have the oscillators round to the nearest fifth, instead of just sliding up and down continuously. To do this, I created integers of multiples of 7 (the number of half steps to a fifth).
Jesse also helped me get it so that the motion speed controlled a lowpass filter. To do this, we used the “t” object to store a float. This allows us to compare it to the previous value and subtract the difference. This speed controls the cutoff frequency of the lowpass filter. I then use the distance from the hands to the head to control the resonance of the filter, keeping it between .3 and .8.
I then compress the result and add reverb, cuz why not?
We also decided that the potential pitches for the right hand/leg oscillator should be the same pitches as the left. We considered at one point having each side have different ranges of pitches to allow for more playability, but decided that ease of use and understanding was more important.
If we had more time, it would have been nice to implement some way for the contour of the oscillator to change with some other variable. However, I’m pretty proud of the work we did– our tool is interesting and usable.