Mingunk – Project 1: Leap Motion Controlled Ambisonics

My first experience using a VR (Virtual Reality) headset was not so long ago. I watched a video that made me feel as if I were riding on a rollercoaster. Even though the video was immersive, I thought that the VR wasn’t utilizing our senses very much. Specifically, I was very disappointed that I was only able to hear unrelated music in the background rather than the sounds of wheels rolling on a track and wind passing through.

With this project, I wanted to study and apply the fundamentals of ambisonics and spatial audio. During my research prior to starting the project, I found other various related projects. One of those was a video of an orchestra session recorded in 360 degrees. It was interesting to experience the sound differences as I moved the scene around with my trackpad. Along with the video, another project showed a train station setting where the sound was vibrant as a train passed along the platform. Interestingly, I was more intrigued by the fact that I was able to hear a conversation in the back of where I was facing.

Based on those great ambisonics examples, I decided to create a Max project that incorporates ambisonics with the Leap Motion controller. I used both the Higher-Order Ambisonics Library and the Leapmotion for Max library.

I first gathered two types of leap motion data using the library. First, the hand/palm data. By using this data, I made Max to recognize two gestural movements: slide and pinch. When I slid my hand, I allowed my sound sources to rotate in circle that represented equally spaced loudspeakers. This is significant because I can not only let the audience hear the differences between each sample but also converge each sound to become one combined music with a faster sliding movement. Also when I pinch my hand, the sources would either get further or closer to the center of the circle. This technique will help listeners to notice the spatialization of sounds. In order to change the gestural motions, I set a certain y-axis value so that a higher point movement will correlate to slide motions and the lower to recognize pinching gestures.

The second type of data utilized was fingertip positions. This data was used to make another mode where I controlled the sound sources with my fingers. By moving the sources in this way, I had more freedom over the location of my sounds and managed what listeners would specifically hear. Moreover, I was able to research on how differing coordinate points of signals can affect the synthesis of sounds to meld together.

In order to amalgamate two modes, I made a switch to turn on the finger movements with the key “d” and hand/palm gestures with the key “f.” However, I didn’t allow both modes to happen at the same time because of the strict distinction between the two types of data and it wouldn’t help a performer to easily understand the differences in using the two modes.

For testing the ambisonics, I used a jazz piece called “What Is This Thing Called Love” by Jesper Buhl Trio and a country song named “Carolina In The Pines” by Pretty Saro. Both acquired from Cambridge Music Technology Database, these music are multitrack files that contain musical stems of a song. I specifically chose songs that have five multitracks and imported a total of ten stems to correlate with my sound source movements. Choosing different genres was consequential because I had the capability of either distinguishing or merging the two types of genre via spatialization.

Below is a screenshot of the main patch and a video in working:

Screenshot of the Main Max Patch
Video Demonstration of the Project (Ambisonics Experience Not Fully Supported Due to Unavailability of Multichannel Recording Devices)

Despite what I learned over the process of making this project, I notice that there are improvements that can be resulted in the future. First, any leap motion-controlled movements should be applicable to both left and right-hand users. As I was researching about the Leap Motion controller applications, I read a lot about how left-hand dominant individuals had hardships using most resources. Along with that concern, it may be useful to provide graphic visualizations of hand movements to audiences. Most listeners will be mesmerized by the spatialization of sound in performance, but having to see the action will gradually affect listeners to notice what is creating the difference compared to normalzx stereo sounds.

By integrating two types of data, any performer can use their musical stems to create ambisonic music. I figure that this can be a helpful framework for those to be creative for sound spatialization events. Furthermore, I have learned a lot about ambisonics and how it can significantly help the field of virtual reality to be more realistic. Even though this project is more applicable to creative performances, I believe that it is a near-future technology that is relevant in many technological medium.