Typology of Fixation Narratives – 60-461/761: Experimental Capture

A typology of video narratives driven by people’s gaze across clouds in the sky.

How do people create narratives through where they look and what they choose to fixate on? As people look at an image, there are brief moments where they fixate on certain areas of it.

Putting these fixation points in the sequence that they’re created begins to reveal a kind of narrative that’s being created by the viewer. It can turn static images into a kind of story:

The story is different for each person:

I wanted to see how far I could push this quality. It’s easy to see a narrative when there’s a clear relationship between the parts in a scene, but what happens when there’s no clear elements with which to create a story?

I asked people to create a narrative from some clouds. I told them to imagine that they were a director creating a scene, where their eye position dictates where the camera moves. More time spent looking at an area would result in zooming the camera in, and less time results in a wider view. Here are five different interpretations of one of the scenes:

I used a Tobii Gaming eye tracker and wrote some programs to record the gaze and output images. The process works like this:

An openFrameworks program to show people a sequence of images and record the stream of gaze points to files. This program communicates with the eye tracker to grab the data and outputs JSONs. The code for this program can be found here.
Another openFrameworks program to read and smooth out the data, then zoom in and out on the image based on movement speed. It plays through the points in the sequence that they’ve been recorded and exports individual frames. Code can be found here.
A small python program to apply some post-processing to the images. This code can be found in the export program’s repository as well.
Export the image sequences as video in Premiere.

There were a couple of key limitations with this system. First, the eye tracker only works in conjunction with a monitor. There’s no way to have people look at something other than a monitor (or a flat object the same exact size as the monitor) and accurately track where they’re looking. Second, the viewer’s range of movement is low. They must sit relatively still to keep the calibration up. Finally, and perhaps most importantly, the lack of precision. The tracker I was working with was not meant for analytical use, and therefore produces very noisy data that can only give a general sense of where someone was looking. It’s common to see differences of up to 50 pixels between data points when staring at one point and not moving at all.

A couple of early experiments showed just how bad the precision is. Here, I asked people to find constellations in the sky:

Even after significant post processing on the data points, it’s hard to see what exactly is being traced. For this reason, the process that I developed uses the general area that someone fixates on to create a frame that’s significantly larger than the point reported by the eye tracker.

Though the process of developing the first program to record the gaze positions wasn’t particularly difficult, the main challenge came from accessing the stream of data from the Tobii eye tracker. The tracker is specifically designed to not give access to it’s raw X and Y values, instead to provide access to the gaze position in a C# SDK that’s meant for developing games in Unity, where the actual position is hidden. Luckily, someone’s written an addon for openFrameworks that allows for access to the raw gaze position stream. Version compatibility issues aside, it was easy to work with.

The idea about creating a narrative from a gaze around an image came up when exploring the eye tracker. In some sense the presentation is not at all specific to the subject of the image; I wanted to create a process that was generalizable to any image. That said, I think the weakest part of this project is the content itself, the images. I wanted to push away from “easy” narratives produced from representational images with clear compositions and elements. I think I may have gone a bit far here, as the narrative starts to get lost in the emptiness of the images. It’s sometimes hard to tell what people were thinking when looking around; the story they wanted to tell is a bit unclear. I think the most successful part of this project—the system—has the potential to be used with more compelling content. There are also plenty more possible ways of representing the gaze and the camera-narrative style might be augmented in the future to better reflect the content of the image.