Sound Critique: Synthesthesia


Synesthesia is a neurological phenomenon where triggers of one sense create the perception in the brain of a different sense. For instance, a sound or note giving the perception of a color. This concept has been explored in works of art including Gershwin’s Rhapsody in Blue, or in the musically inspired paintings of Kandinsky and Picasso.

This project aims to evoke those same connections by controlling music with visual stimuli. Additionally it has the potential to allow a visually impaired user to experience art through synthesized music. Through the use of a camera and a computer synthesizer, Synthesthesia plays a simple musical composition with four parts each with its amplitude controlled by aspects of the image captured by the camera.

The percussion is driven by the overall brightness of what the system sees. Bright images evoke louder drum beats, and as the light fades the volume of the drum beet fades with it. Similarly three different synth tracks are controlled by the red, green, and blue intensities in the image. Blue plays a saw toothed bass sound, green is a sequence of bells, and red is a synth lead.


The initial hope was to package the whole system into a handheld device with audio, video, processing, and a battery. However, the Raspberry Pi Zero was unable to handle the load. That said, there is no doubt that in the next five years the necessary processing power can be easily placed in the palm of your hand.

For now, the camera and audio is processed and synthesized on a laptop. A Python program using OpenCV takes in video from the webcam, and measures the average brightness, as well as the amount of red, green, and blue in the image. That triggers Sonic Pi to adjust the levels in the loops.


Tech stuff

In OpenCV it can be tempting to just grab the RGB (or as they call them BGR) values, but these values tend to swing more than you would think with variations in brightness and shadows. Instead converting to Hue Saturation and Value (HSV) allows for isolating by color (hue) range independent of the brightness. From there, just taking the average across the whole frame gives a pretty good level that can easily be scaled and passed into an OSC message.

On the synth side, Sonic Pi allows for creating multiple synced loops. In this case I made four: beats, red, green, and blue. Each listens for an osc trigger, and uses that to set the amplitude of the samples and synths. These are each saved as individual files. Red, green, and blue are tied to the beats loop for syncing, so it’s best to start each of them first, and then all will trigger together when you run the beat loop.

Assignment 8: Net Chimes


Inspired by Jet’s examples of doorbells with physical chimes, and my last post about Ballet Mecanique, I wanted to make something both percussive and tonal that responded to a digital event stream. I thought it would be fun to reproduce something like the chimes used in theaters to signal the end of intermission, but have it triggered by my Google Calendar rather than an usher.

To accomplish this I made a python program based heavily on the quickstart offered by the Google Calendar API, which checks the time until the next meeting on my calendar, and when that meeting is a certain number of minutes away, it plays a chime to indicate the amount of time remaining (5, 4, 1, or 0 minutes were chosen arbitrarily).


Tech Stuff

The only hardware used is a pair of servo motors, and the Sparkfun board connected to the laptop (for serial and power connectivity). When the board is sent a character corresponding to a note (A, B, C, D, E, or F… G was too far), it first rotates the base to the appropriate angle to line up the mallet with the note, and then the other motor strikes the note. This allows the Python program to dictate the note sequence as a simple string of characters, and does not require reprogramming the microcontroller to make new chimes.


Inspiration for this week: dada

George Antheil composed this piece for a variety of mechanical instruments including player pianos, an airplane propeller, and a siren. It was originally for a film, but the score works on its own.

He also co-invented frequency hopping for steering torpedos (with Hedy Lamarr)!

I got to see a “live” fully automated performance of this at the national gallery many years ago:

Assignment 7: Two Hearts Beat as One


We’ve all had the intuition that the music we listen to can affect our heart rate, whether its getting us excited or calming us down, and at least one scientific study has found some evidence to confirm this suspicion. There is even some evidence that we may match heartbeats with our partners when we’re near them.

With this in mind I attempted to create a prototype to play a melody with the goal of encouraging the user to raise or lower their heart rate to match a target.


The program measures time between beats and translates that into a BPM measurement. This measurement is averaged with the target to create a match beat halfway between the target and measured BPM. As the rates converge the music plays in time with the heart beats of the user.

Beat measurements are simulated with a pushbutton, and a modulated sound output plays an F6 arpeggio at the match rate. F6 was chosen from experience to work well both as an uplifting and calming chord for either raising or lowering heart rate.


Tech Stuff

The circuit is pretty basic. A button pulls down pin 3 to trigger an interrupt which calculates the time since the last interrupt to measure current heart rate (this represents a heart monitor). Audio output is on pin 13 (with a 100Ω resistor in series), and the onboard RGB LED cycles along with each note in the sequence.

Code and wiring:



Kinetic Crit: Touch Mouse


Whiteboards and other hand drawn diagrams are an integral part of day to day life for designers, engineers, and business people of all types. They bridge the gap between the capabilities of formal language and human experience, and have existed as a part of human communication for thousands of years.

However powerful they may be, drawings are dependent on the observer’s power of sight. Why does this have to be? People without sight have been shown to be fully capable of spatial understanding, and have found their own ways of navigating space with their other senses. What if we could introduce a way for them to similarly absorb diagrams and drawings by translating them into touch.

touch mouse prototype touch mouse prototype

The touch mouse aims to do just that. A webcam faces the whiteboard suspended by ball casters (which minimize smearing of the image). The image collected by the camera is processed to find the thresholds between light and dark areas, and triggers servo motors to lift and drop material under the user’s fingers to indicate dark spots above, below, or to either side of their current location. Using these indicators, the user can feel where the lines begin and end, and follow the traces of the diagram in space.


The video Jet showed in class showing special paper that a seeing person could draw on, to create a raised image for a blind person to feel and understand served as the primary inspiration for this project, but after beginning work on the prototype, I discovered a project at CMU using a robot to trace directions spatially to assist seeing impaired users in way-finding.

Similarly in the physical build I was heartened to see Engelbart’s original mouse prototype. This served double duty as inspiration for the form factor, and as an example of a rough prototype that could be refined into a sleek tool for everyday use.

1ere souris d’ordinateur


The Build and Code

The components themselves are pretty straightforward. Four servo motors lift and drop the physical pixels for the user to feel. A short burst of 1s and 0s indicates which pixels should be in which position.

The python code uses openCV to read in the video from the webcam, convert to grayscale, measure thresholds for black and white, and then average that down into the 4 pixel regions for left, right, up and down.

I hope to have the opportunity in the future to refine the processing pipeline, and the physical design, and perhaps even add handwriting recognition to allow for easier reading of labels, but until then this design can be tested for the general viability of the concept.

Python and Arduino code:


Assignment 6: Thumper – turning sounds into touch

This project is inspired by the Ubicoustics project here at CMU in the Future Interfaces Group, and by an assignment for my Machine Learning + Sensing class where we taught a model to differentiate between various appliances using recordings made with our phones. This course is taught by Mayank Goel of Smash Lab, and is a great complement to Making Things Interactive.

With these current capabilities in mind, and combining physical feedback, I created a prototype for a system that provides physical feedback (a tap on your wrist) when it hears specific types of sounds, in this case over a certain threshold in an audio frequency band. This could be developed into a more sophisticated system with more tap options, and a machine learning classifier to determine specific signals. Here’s a quick peek.

On the technical side, things are pretty straightforward, but all of the key elements are there. The servo connection is standard and the code right now just looks for any signal from the computer doing the listening to trigger a toggle. The messaging is simple and short to minimize any potential lag.

On the python side, audio is being taken in with pyaudio, and then transformed into the frequency spectrum with scipy signal processing, and then scaled down to 32 frequency bins using openCV (a trick I learned in ML+S class). Then bins 8 and 9 are watched for crossing a threshold, which is the equivalent of saying when there’s a spike somewhere around 5khz toggle the motor.

With a bit more time and tinkering, a classifier could be trained in scikit learn with high accuracy to trigger the tap only with certain sounds, say a microwave beeping that it’s done, or a fire alarm.

The system could also be a part of a larger sensor network aware of both real world and virtual events to trigger unique taps for the triggers the user prefers.

Morphing Matter

I was reminded of this today upon seeing Ghalya’s smart flower.

Dr. Lining Yao is right here on campus, where she runs the Morphing Matter Lab. In the video below from Google Design (9 minutes in) she demonstrates a flower printed to self fold into a flower. And then later she shows materials that morph when exposed to moisture (16 minutes in), and even later a soft robotic bunny that has tendons that actuate to hug you (29 minutes in). I recommend watching the whole video!

Lining Yao

Assignment 5: Rain or Shine?

The just in time umbrella indicator

You’re in a hurry on your way out the door for the day. You grab your keys, your wallet, your phone, and head for the door, but then you freeze… is it going to rain today?

Your bag is already stuffed to the max, and you don’t want to have to carry around an umbrella just in case, so you stop, pull out your phone to find the weather app, and look to see what the day has in store for you.

There’s 16 new notifications, and one of them distracts you just long enough for a new one to pop up telling you that you’ve missed your bus before you get a chance to discover that you’ll need that umbrella while you stand out in the rain waiting for the next bus to get you where you’re going 20 minutes late.

What if your stuff knew when you would need it. What if you immediately knew it was going to rain right as you reach for your umbrella, or better yet it pointed you to grab your sunglasses instead.

There is plenty of information available these days, but we always have to go hunting for it in a haystack of apps and notifications.

In the future we can teach our devices (not just our phones and smart devices) to fetch that information for us, and instead of just notifying us, they can take physical action in the real world. Instead of pestering us every time they have new information, or waiting for us to ask for it, they can present it at just the moment it’s needed.

You don’t care if it’s raining until you are about to walk outside, and you shouldn’t stop to check on your way out the door.

The Prototype

To demonstrate the basic idea of this type of physical indicator, a servo motor points at either your sunglasses or your umbrella based upon the message it receives from a weather API like Dark Sky. The response is parsed, and the device is sent a simple string saying either “sunny” or “rainy”. Based on that feedback it points either to the umbrella or the sunglasses.

Here’s a video:

The wiring and code are very simple in this prototype, but to implement the more futuristic versions described above wouldn’t really require much additional wiring or code.

The only pieces required are to embed a haptic motor, micro-controller, battery, and transceiver into the device. Inexpensive controllers like that already exist, but are not yet ubiquitous.

The other half of the equation is a system capable of taking in all of the IoT data in your environment and on your person, and understanding the bigger picture of what combination of triggers should wake up your umbrella just as you’re walking out the door. Smart home devices are getting closer to this every day.

Code + stuff:

The wiring is straightforward. Just a standard servo connection.

Critique 1: Visual Interaction – Speak Up Display

I recently saw a talk on campus by Dr. Xuedong Huang, founder of Microsoft’s speech technology group. He did a  demo of the latest speech to text on Azure, combined with HoloLens  and I have to say I was impressed.

The went from this failure several years ago:

To this more recently (Speech to text -> translation -> text to speech… in your own voice… and a hologram for good measure):


This got me thinking that a more earthbound and practical application of this could be prototyped today, so I decided to make a heads up display for speech to text that functions external to a computer or smartphone.

If you are unable to hear whether due to a medical condition or just because you have music playing on your headphones, you are likely to miss things going on around you. I personally share an office with four other people, and I’m often found tucked away in the back corner with my earbuds in, completely unaware that the other four are trying to talk to me.

Similarly my previous research into common issue for those that are deaf, is being startled by someone coming up behind them, since they cannot hear their name being called.

With this use case in mind, I created an appliance that sits on a desktop within sight, but the majority of the time it does its best not to attract attention.

I realize it would be easy enough to pop open another window and display something on a computer screen, but that would either have to be a window always on top, or a bunch of notifications, so it seemed appropriate to take the display off screen to what would normally be the periphery.

The other advantage is a social one, if I look at my laptop screen while I’m supposed to be listening to you, you might think you’re being ignored, but with a big microphone between us, on a dedicated box with simple text display, I’m able to glance over it as i face you in conversation or in a lecture.

When it hears text it displays it on the LCD screen for a moment, and then it scrolls off leaving the screen blank when the room is quiet. This allows the user to glance over if they’re curious about what is being said around them:

Things get more interesting when the system recognizes key words like their name. It can be triggered to flash a colored light, in this case green, to draw attention and let the user know that someone is calling for them.

Finally, other events can be detected and trigger messages on the screen, and LED flashes.

The wiring is fairly simple. The board uses it’s onboard Neopixel RGB LED to trigger the color coded alerts, and the LCD screen just takes a (one way) serial connection.

Initially the project began with a more elaborate code base, but it has been scaled down to a more elegant system with a simple API for triggering text and LED displays.

A serial connection is established to the computer, and the processor listens for strings. If a string is less than 16 characters it pads it for clean display, and if it has a 17th character, it checks it for color codes:

void setled(String textandcolor){
    switch(textandcolor[16]) {
      case 'R':
        strip.setPixelColor(0, strip.Color(255,0,0));;
      case 'G':
        strip.setPixelColor(0, strip.Color(0,255,0));;
      case 'B':
        strip.setPixelColor(0, strip.Color(0,0,255));;
      case 'X':
        strip.setPixelColor(0, strip.Color(0,0,0));;

A computer which uses the appliance’s microphone to listen to nearby speech can send it off to be transcribed, and then feed it to the screen 16 characters at a time, watching for keywords or phrases. (This is still in progress, but the communication bus from the computer to the board is fully functional for text and LED triggers)

After some experimenting, it seems that the best way to display the text is to start at the bottom line, and have it scroll upwards (a bit like a teleprompter) one line at a time every half a second. Faster became hard to keep up with, and slower felt like a delayed reaction. (Arduino code + Fritzing diagram)

I’d love to expand this to do translation (these services have come a long way as well), and perhaps migrate to a Raspberry Pi to do the web API portion so that the computer can be closed and put away.


I made the system more interactive by making the microphone (big black circle in the images above) into a button. While you hold the button it listens to learn new keywords, and then alerts when it hears those words. Overtime keywords decay.

The idea of the decay is that you would trigger the system when you hear something it should tell you about, and if you don’t trigger it the next time it hears it, it becomes slightly less likely to trigger again. This also begins to filter out common words from more important keywords.

This weight system is merely  to be a placeholder for a more sophisticated system.

STT Update

Re: Story of Your Life, etc…

Arrival is perhaps my favorite film, and while I was already familiar with the Wolfram documentary, and the original Ted Chiang story, I am always excited to revisit these ideas, and learn more about the interplay of language and cognition. I was impressed with the way the Electric Didact dissected this concept in the film and tied it back to the root of the very word “understand.”

Even more interesting to me is when we try to use language to express what we see, thereby translating visual cognition into an audible expression and back again. (As we are aiming to do with our projects in this course.)

This idea reminds me of a study that found that Russian speakers, who have separate words to distinguish between light and dark blue, are quicker to recognize these subtle differences than English speakers when shown two different shades, thus indicating language affecting visual perception right here on our own blue planet.

Article from the National Academy of Science

On the other end of the same cycle, Vox did an interesting piece looking at the evolution of words for color in language across different cultures, beginning almost always with just light and dark, then next to red before blue and green.

I’m curious if there is any way to actively adapt the interconnection between visual and linguistic cognition for use in interface design, or to create new connections by building a new vocabulary to map optical cues to concepts that do not have representations in the visual spectrum.