In this project, I explored the concept of combining semi-generated music with noise. The initial signal in the piece includes midi chord and head position. The patch further combines these two signals, which trigger other noises that contribute to the final piece.
Initially, I planned to work with finished midi sequences but ended up with using only chords since the goal of the project is to generate both noise and midi notes. These generated midi notes vary in velocity, which is controlled by vertical head angle. The facial landmark positions are generated through pre-processed python script using dlib, easy reading here (https://www.pyimagesearch.com/2017/04/03/facial-landmarks-dlib-opencv-python/). the output looks like the figure below. The arrays of x, y coordinates are sent via UDP through python using the OSC library(https://github.com/ptone/pyosc), which makes the process easier(the native socket library in python does not work very well with Max).
The first part of the patch includes parameter-processing and calculations. After the facial landmark identification script is loaded, you can press A in max to set up the reference coordinates on which further processing and calculation will be based. Part of the patch is presented in the figure below. For all parameters, the average over 4 incoming values rather than original value is used for more stable implementation. Note that all values/distance calculated are not the values you will get measuring your face with a ruler. The calculations are made to be as simplified as possible and no normalizations are implemented. The output represents scales, rather than the exact values.
The second part is note-processing: In the current version, the patch is comfortable with 3-note chords. I will keep on working on the patch so that it can be more flexible in dealing with chords.
You can control the following sound effects using the following parameters:
- Landmark 28, Horizontal Position, pitch shift in background noise with granular synthesis, triggers crow noises effect. I implemented manual fade in and fade out for the sound sample for better ambiance:
- Landmark 28 + 1, Horizontal angle, triggers gunfire background noise with granular synthesis, greater variance corresponds to greater random pitch rate. Since calculating variance can be a little bit difficult, I used another value of similar nature(picking the first and the last value from the bucket and calculate their distance) to work with the concept. (My granular synthesis implementation is this project is relatively native since the goal is to produce sound effects. )
- Landmark 28 + 20, Eyebrow-raise, triggers an argument sound effect.
- Landmark 48 + 44, Eye-close, stop the generated notes from playing.
- landmark 9 + 58. Vertical head angle, changes note velocity. Raising your head produces louder notes, assuming that you start by looking down at the screen.
- Landmark 49 + 55. Smile/Smirk, triggers a wicked laughter. The laughter is transformed into multiple pieces of different pitches using phasor so that you can hear the smile from multiple people(witch and wizards?). A custom-pan is implemented so that you can pan the laughter by moving your head sideways. In the full demo, I forgot to move my head to the direction to which the pitch shifted voices can be triggered, here is a sample for this sound effect alone.
- There is also a background thunder effect, triggered by low-pitch notes generated in the note-processing patch.
You can also make changes manually while performing if you are not happy with the sound.
Due to my computer’s inability to process all the data while recording both audio and video, only the audio file is included.
In this project, I explored the harmony and intervals in midi files and visualized these qualities. Each midi note is represented by a cube, which is pressed down/pulled up when the note is turned on/off. A noise value relevant to the dissonance of the chord currently held is generated and applied to the position attribute of the cubes. The rendered mesh object changes its color mode when there is a root-note change detected in the chord. Unfortunately, the visualization is pretty crude and I’m still very far from what I wanted to do. I had some troubles trying to manipulate each cube independently in a more creative way under jit.gl.mult context, for example, applying a glow effect on a specific cube when a note is turned on. My major plan is to improve my methods of generating the cubes so that ultimately they can be manipulated independently.
This part of the patch evaluates the intervals in a currently-held chord and assigns a dissonance value to it. The current evaluation is subjective and cannot accommodate inversions or the subtle differences in complex chords. For future work, I will integrate the material discussed in this note and produce more robust evaluations. http://www.oneonta.edu/faculty/legnamo/theorist/density/density.html
This part of the patch generates the noise value and applies it to the position matrix.
A short demonstration of the patch(I know the visualization still looks too simple; I will work on more ways to integrate the signal processed into the rendered objects).
This project is based on what we did in class – visualizing audio using pfft~. I parsed pfft output into four bands using bin index. Each band covers a range of frequency(low, mid-low, mid-high, high, respectively), represented by blue, green, pink and white. This video shows the visualization of Morton Gould’s Interplay: IV. Very Fast, With Verve and Gusto, a piece with very beautiful orchestration. Piano, which dominants most part of the music, lies in green and blue bands while woodwind, brass, and percussions occasionally pop up in pink and white. I also tried to run the patch with pop music, in which there seems to be a larger pink/white presence.
The video quality seems to be embarrassingly bad… I will work on it next time…
The Patch is similar to the class one, with additions of more matrices in the main patch and band filters in pfft~ subpatch. Currently the bin filters are hard coded, I’ll see if I can improve the model to adjust bin filters on the fly.
For this assignment, I used a recording of myself reading in class as the original signal and convolved the signal with the following IRs:
Despite the noisy nature of the IR, the output sounds pretty normal with a tint of robot-like quality.
- microphone knocked to the ground while recording
Sounds just like a very standard reverb.
The output audio preserved the notes fairly well but it’s impossible to identify the words in the original recording.
The word China blends well with the original recording, but if you look for it, you can find where it’s lurking easily.
Project 1 Proposal
In this project, I will explore the potential of midi music visualization in a 3d environment. I was inspired by our in-class av-synth demo and this video:
In this video, the max patch takes the frequency of audio input and present a simple visual feedback through the ‘ocean’ in a jit.world environment. In my project, I will import midi piano music in two channels and visualize their interactions in a (not so) similar way. My focus will be exploring the properties of the midi notes, (such as velocities, chord roots, and intervals) rather than the techniques of visualization. I’m interested in how to transfer what we feel about these musical properties into a visualized environment and hopefully, I will be able to implement this project in my own compositions. My goal is to create a project with trivial rendering demands and complex signal processing mechanisms.
In this assignment, I used time-shifting techniques to simulate multiple people reading the same text.
The first component of the patch is a phasor which can generate pitch change when used in combination with a delay window(tapin~ and tapout~). I think the output of this approach sounds less robot-like compared to that of gizmo~ and freqshift~.
The second component adds a small randomized time shift to each track/simulated audio so that “multiple people’s” voices are not completely synchronized with each other. I want the time shift value to vary with time, too, since if one audio is always x milliseconds behind another, it will be pretty easy for the audience to tell the hardcoded delays. To work around this issue, I feed the volume of each audio back to itself. When there is a silence or a sentence break in the audio, the patch will generate a new time shift value.
The attached audio is a sample from an audiobook.The same technique can also be applied on the fly.
In this assignment, I applied a text summarization API(http://textsummarization.net) to an abridged version of The Republic, Book I. textsummarization.net, managed by a group of professional Natural Language Processing and Machine Learning researchers from University of Science and Technology of China(USTC), is one of the several popular text summarization APIs online. In this experiment, I want to summarize The Republic downto one sentence.
Completing this task in one setting can be very demanding of the algorithm and does not conform with how information is conventionally processed. Therefore, I decided to cut 1/2 of the text from every last attempt to produce the new summary. The original file has approximately 479 sentences; it is reduced from 479 to 240, from 240 to 120… until there is only one sentence left.
I’m still trying to learn more about the exact algorithm applied in the API. Therefore, I cannot comment on how reliable it is. However, since a decent amount information is kept in the first few rounds (and they seem to convey the material in the Republic pretty well), we may assume that the algorithm is at least, functional. Although in the final summary, the information in The Republic is completely destroyed, I must admit that the final summary is exactly my what I can tell about The Republic at the moment (after studied it a few semesters ago).