For this project, Bo and I decided to merge some aspects of our project 1 assignments, mine being an autotuner patch and his being an audiovisualizer, to create a Rockband-esque game that tracks what note you’re singing and scores it based on if you’re being corrected by the autotuner to the correct pitch in Twinkle Twinkle Little Star.
My personal contribution to this project consisted of supplying Bo’s audiovisualizer with any necessary info about the game logic, markers, current/upcoming notes, color values to display, etc. that could be variant based on how the player is doing within the game. I also created an easy and a hard mode for the game, with easy mode correcting you to only notes in the major scale of the key you’re playing the game in and hard mode correcting you to a chromatic scale no matter what key you’re in, making your margin of error slightly larger. As mentioned, the game can be played in every different major key, so users with different vocal ranges can play the game.
In addition, any of the audio heard throughout the duration of the game is coming from my patch, be it the piano playing the MIDI notes along with you or the autotuned signal. The autotuned signal is located in the “inputs” encapsulation, which also includes the game controller bpatcher I created, and within the “inputs” encapsulation is the “twinkle” encapsulation, which processes and outputs all of the variables specific to Twinkle Twinkle Little Star in a manner that Bo’s patch can then work with and draw from.
When it comes to game logic, we score based on how many beats you sing correctly (including however many the autotuner helps you to get), and the score is displayed on the game controller. You can then start a new game in whichever key you select. Regardless of what key you choose, there is a four beat lead-in playing the first note at tempo so you can get ready before the game round begins.
To demonstrate, I put myself through the ringer on easy mode:
For this project I wanted to dive into something I’ve been curious about for years now: auto-tune. I was never quite sure how the process exactly was achieved, so I aimed to mimic it to the best of my abilities and potentially make it more suitable for live work.
That being the case, I traversed through a plethora of pitch-detecting and pitch-shifting max objects, be it pitch~, sigmund~, gizmo~, pitchshift~, fzero~, etc. and I finally came upon retune~, and found it to be sufficient for what I was aiming to do. It allowed me to get a decent estimate of incoming frequencies and shift them to match certain scales. In order to do so, I had to map out intervals in cents and create lists that could be accessed easily by the program so that the latency wouldn’t get too bad.
I also decided to add on a couple different features to this core idea. I allowed for the autotuning of a file or of the microphone input on the computer. In addition, I plugged in my assignment 4 patch to allow me to record and manipulate some newly autotuned audio. Finally, I added a sonogram to highlight the rigidity of the processed audio and two nsliders displaying the original note of the signal next to the corrected note after processing.
Below is a recorded demonstration of the autotuning effects (the manipulation of recorded data within the patch will be demonstrated in class). Apologies for not actually attempting to sing Buy U a Drank, I figured even with auto-tune I can’t really be T-Pain…
For this project, I didn’t really know where to begin, so I went back to the tutorial on the pfft~ object to see if I’d be inspired. Thankfully, I was! They made a patch on that tutorial that dealt with capturing a recording and processing that with some fancy math, so I tweaked it to create recordings with a vocoder-like effect, and used it as my subpatch for the pfft~. For reference, I included a link to the tutorial at the end of the post.
Here is both the subpatch (framerecord) and the main patch (frame-player).
Where the fun really began though, was figuring out what I could do with that recording. Because I used both inlets of my pfft~ separately, I dedicated one to recording and the other to doing different playback effects that would be done in the main patch. The reason I was able to do this is because my fft treated each frame of audio separately and stored them in a buffer~ to be used whenever I desired. So, if I wanted to just listen to one frame for a while, I could stop the recording on that frame and hear that particular frame for as long as I wanted. I added this capability into the playback effects used in the main patch.
As for the other playback effects, which were all facilitated through a counter, I added the ability to speed up playback, slow it down, play forwards, play in reverse, and play the whole recording forwards then all backwards. Lastly, for convenience, I added the ability to change the maximum value of the counter so short recordings could loop back through without having a long gap of silence while waiting out the space in the buffer~ from the subpatch that wasn’t used in the recording.
I did a demo of all of these effects, which can be found here:
And of course, here is the code for my main patch:
This is for all you out there who want to hear George W Bush messing up a common saying in a bunch of different rooms/soundscapes.
For those of you who haven’t heard the original, it’s already pretty hilarious:
As for convolving this impressively un-presidential soundbyte, I decided to use recordings from one wildly reverberant room (a racquetball court) and a room with almost no reverberations at all (a sound isolation booth). I did this to try to see how drastically the soundbyte could be layered and filled with reverb from the racquetball court and how thin sounding it could be from stopping almost all reverberations using the iso-booth. For your reference, here are the two impulse response recordings I took:
The racquetball court recording was taken from popping a balloon in a corner of the room and recording from the center so we could capture any weird reflections that would result from popping the balloon in a corner. It almost sounds like an explosion, and it provides an extremely long decay that would be more than interesting to hear through Bush’s soundbyte.
The sound isolation booth recording was taken from popping a balloon in the center of the room and recording from very close to the center as well. We really wanted to see how the impulse response sounded from the perspective of the object or person making the sound, as the normal usage of that booth is for people who want to practice singing. Capturing how much the sound of the voice would be altered was our main interest.
Here are the respective convolution engine results:
The racquetball convolution made an output very similar to what I expected. It had an obnoxious amount of reverb and took a very long time to settle back down to a state of silence even after the original clip stopped playing. I could imagine that if George W Bush had recited that line again, this time in a racquetball court, he’d be even more confused than he already was, since he would hear his own words flying back at him left and right constantly.
The practice room (iso-booth) convolution was a little underwhelming. Though you can definitely tell that the soundbyte got thinner, with much less room presence and ambiance, I was hoping to hear an even more brassy sound. Regardless, you can clearly hear that it almost sounds like Bush is in a vacuum when giving his speech.
For the next two impulse responses, I decided to explore one with a musical component and one with a component of repetition. For the musical component, I used a recording of a hangpan, which is a little-known steel drum instrument that looks like a huge turtle shell and is made out of metal. It creates very tropical sounds with lots of reverb. For the component of repetition, I actually reused the same exact clip that I was convoluting as the impulse response too. Yep, you guessed it, I used Bush to make Bush sound like he was Bush in Bush but not in a bush. Bush in a bush might not be as interesting as Bush in Bush. Anyhow, here are the impulse response recordings:
The hangpan, as mentioned earlier, has a very reverberant quality, but it can also have a more percussive usage, so I found a clip that demonstrates both techniques. It’s also worth noting that these are two hang drums being played at the same time by one performer in total.
And for your repeated entertainment, here’s George W Bush messing up again:
Here we have the hangpan convolution. This gave the soundbyte a very calming presence actually, as it sounded like Bush was speaking very softly in a dream about a vacation to Jamaica or any tropical island of your choice really. I found it to be quite beautiful and much less funny than the original.
Finally, we have the convolution of Bush by using the exact same clip as the impulse response. Note: I had to lower the gain drastically to get anything that didn’t sound like I was just holding a mic in front of its speaker for no reason. But when I did lower the gain, the convolution engine created this weird, nearly cyclic pattern of Bush delivering his wisdom with his peers in very small portions. Particularly after the original clip is done playing, leaving us with only the sounds of the reverb, you can hear very specific phrases repeated over and over. I’d like to point out that the last “fool me can’t get fooled again” playing several times in a row at the end was really an accident but its hilarious to me so I wanted to highlight it!
There you have it, George W Bush in several different soundscapes, including one of himself delivering an absolutely iconic piece of wisdom.
This patch developed from my interest in the motion detection audio filter that we made in class. I wanted to explore more of what the jit.3m object did and how the different output values could affect the sounds produced if all else was kept the same. The eventual result was a motion detected ghostly sound.
The way I got there was by simply creating a sinusoidal signal generator for each output (minus the dumpout) of jit.3m, and seeing what would happen if I used the various outputs to set the frequencies of each cycle~ object. I realized very quickly that a super annoying robotic beeping occurred, and I wanted to see how I could alter it, so I decided to smooth out the change from one frequency to the next. This gave me the ghostly spooky sound without all the obnoxious beeps.
As for what each output of jit.3m did to the signals, they each had their own qualities. The max output of the jit.3m object was returning the number of the biggest difference in the matrix from one frame to the next, which gives the cycle~ object extremely high frequencies at high motion, resulting in the high whistling sounds. The mean output was the most dynamic and dependent on how much motion was occurring, and resulted in the easily audible midrange frequencies that really amp up the ~ s p o o k y~ aesthetic. The min output stayed at one frequency the whole time, the lowest frequency that I allowed after scaling. This made sense because unless someone was to go absolutely NUTS in the frame, there was always going to be at least one pixel (or cell in the matrix) that didn’t change from one frame to the next, so the min value was constant. This is the lower pitched droning in the mix.
Added bonus: the pwindow that’s showing the motion detection makes you look like a ghost #ArtisticUnity
Since arriving at CMU, I’ve gotten exposed to recording and editing audio in Pro Tools, but I have never really tried to make a sound anything but album-ready. This being the case, I wanted to explore what kind of negative effects something like a simple delay (often used in small doses to make vocal tracks or other instrumental tracks sound like there is a “glitch” in the processing system or some other cool effect) could have on a vocal track. So, I recorded myself saying one phrase, then repeatedly added the smallest possible delay to the audio, duplicated it, and continued the process until it became very hard to pick out what I was saying. Frankly, I stopped at the point where I couldn’t stand how screechy the audio became.
What resulted is a clip that both annoys you with my voice and makes your ears want to bleed so enjoy, or just invest in some ear plugs, up to you!
I’ll also add that as the trials went on and on, the signal started clipping to an INSANE degree. I had to use a limiter on every clip from about the fourth repetition on. So if you think this is bad, imagine what it would’ve sounded like naturally.