Project 2 – Help Wanted

The idea behind my project was to make a system that could visualize or represent posts on the subreddit /r/relationships. These posts are often pretty emotionally charged and an interesting blend between general and universal to highly specific. Keeping in mind that most people post their with the hope of solving issues within their personal life, my conception was to use Python to generate a series of sentiment analysis scores from the posts. Said scores that would then be used to modulate different parameters on instruments to produce a sound that dynamically evolves with the emotional content of each post. I overall achieved my goal, and while the sound is not quite as melodious as I might have hoped, I like to think that reflects just how emotionally charged and ambigious most relationships can be.

The project was split into two components, a Python pipeline that received new posts from /r/relationships and updated two big CSV files, and a system of Max patches that generated audio based on recieved input from those files. I’ll go into detail for my process for both below:

Python Pipeline:

The result of running posts through the pipeline generates 2 large space delimited files, one of scores and one of titles features. Each post has 6 scores, which were the 5 highest value (in terms of magnitude) sentiment scores corresponding to what was interpreted as the 5 words with the most emotion beyond it, and the overall average sentiment of the entire text, which was the 6th score. (Format: 1st highest score, 2nd highest score and so forth…, followed by Overall score) Title extraction was based on a rule of /r/relationships, which specifies every post must mention gender and age of both the poster and any people referred to within the post. Even with this in mind, this part was messy. While not all posts were simply between two people, after analysis around 83% of them were, so to avoid issues with input inconsistency later in Max, I removed posts that didn’t correspond to this two person rule. Even then however, there were issues with inconsistent formatting (not everyone follows subreddit rules) that my pipeline couldn’t parse properly, so if values seemed malformed I set up a default value (the tuple (30,-1,30,-1)) to substitute those values. (Format: Age,Gender,Age,Gender, where Male was represented by -1 and Female by -2)

A sidenote: the choice to only keep posts between two people lead to a bit of a change in focus; while not all initial posts were focused on relationships with significant others, a good amount of two-people posts were. I decided to incorporate a piano rendition of the main melody from Fatima Yamaha’s “What’s a Girl to Do”, as the track feels like it’s about the tension and ambiguity inherent in relationships, something powerful that I wanted to try and capture.

There are four files in the python zip. Before running any of them, you must set up your own reddit account and OAuth permissions and install some python modules: PRAW, reddit’s API, numpy,glob, and nltk.vader (you may have to install the whole package for nltk but it’s a lot of stuff, so I would try to only install vader first and install all of it if there are issues).To set up an OAuth token for reddit, follow this tutorial here: https://praw.readthedocs.io/en/latest/getting_started/authentication.html .
1) testq.py : go in and edit the four values in snag() to set up your own bot. Run in terminal with the destination for three output files (titles,text,url) to output a stream of the most recent posts broken into titles, text, and urls into those three different files. For reference, my command when running it looks something like this:

python3 /Users/lawrencehan/PycharmProjects/reddit/testq.py /Users/lawrencehan/Desktop/project_one/titles.txt /Users/lawrencehan/Desktop/project_one/text.txt /Users/lawrencehan/Desktop/project_one/url.txt

2) senti.py: Next, take the files you’ve just output and run them with senti. Senti should output three files, one for the title and two for the 5 scores and overall score respectively.

I ran these previous files several time before running the latter two while deciding how exactly I would get all the data into Max, which meant a lot of output that was split up into different files

3) cleanup.py:  This should output two csv’s containing title representations and scores of all files written within the directory. The second zip attached to this post contains all the non python files, with directories already generated for titles and scores. I would recommend you do all this processing within that folder. When you run the previous two python files, you’ll get 3 outputs that are consolidated in this step into files that you will always write to if you want to update them with more scores.

4) final.py: Max has a problem with recognizing commas in text input. That, along with formatting issues, meant that this file was needed within the pipeline as well. This converts csv files into regular space delimited files and ensures that the last number of each line is not rendered as a string, both issues that strongly affect how Max reads in these files.

After running each step of the pipeline, you should have two csv’s that look something like this:

Text editor view:

Excel/Numbers view:

Max/MSP:

Before beginning this process, I’d known by this point I wanted to incorporate the Yamaha track melody in same way. Because most sounds generated by Max sound pretty synthetic, I decided to render the melody with piano, to provide a counterpoint of sorts. I decided to use a kickdrum as my main percussive element and a low end bass drone as well as a high pitched reedy sounding drone for higher frequency content. I settled on these elements because they were relatively static and easier to hear the effects of modulation upon. Upon generating these instruments, I connected them all within a main patch. Each instrument took in some combination of scores and age/gender information, which were then used to modulate different interior parameters like filter resonance and distortion levels. Many of this score/gender information was ramped as it updated from one value to the next, which allowed for smoother transitions and avoid “clicks” from rapid changes.

When opening the Max Patch, load in alltitles,allscores,and the fatima piano loop in that order.

The output is very noisy and chaotic; here’s a sample of it below:

 

And here are the files you need to make this work:

reddit

project_two 2