Compositing Time in Space

Product

What are you seeing: a timelapse overlaid onto a slo-mo shot at the same location.

This is not the final version I had envisioned because I made a mistake when I was shooting the scene I had in mind for the final version yesterday and didn’t have time to reshoot. The following are frames of the footage I shot for the final version. I would have myself walking across the frame in slow motion while other people pass by me (along the same path) in a timelapse. It’s the exact same capture machinery as the test footage but the subjects align more with what I want to convey with my work.

Ideation

From an experimental capture perspective, I want to experiment with compositing different speeds of time and temporally distinct occurrences of events in the same video. I was inspired by the visuals of timelapse and hyper-lapse that contain a still subject in contrast with the sped-up subjects (see sample image below. I want to further this idea by developing a method of slowing down the subject beyond real-time and a tool that would) allow me to pick the subject and overlap it with the sped-up subjects as I wish.

From a conceptual perspective, I want to explore our perception of the speed of time’s passage and our linear understanding of the sequence of events. By contracting the past and the future into a microscopic examination of the present (i.e. by condensing the people before me and people after me into the magnified present moment of me walking along the path,) I attempt to question our place and significance when put in perspective with the grand scheme of time and examine the relationship between the past, the present, and the future.

Process

The basic idea behind this is to find a composition where the subjects I want to speed up are in front of the subjects I want to slow down (the reverse would work too but is less ideal due to the current state of AI.) Ideally, there should be no penetration of slow and fast subjects in the 3D space — 2D space is obviously fine as that is the whole point of developing this method (this is something that happened in the test footage where people penetrated the car that I attempted to fix in the second footage.) Then I would have two temporally distinct sequences, with the same composition, one of the slow subjects and one of the fast subjects. I would then rotoscope out the subjects of interest from the footage I want to speed up. Here, I used a semantic segmentation model (DeepLab’s Resnet 101 trained on a subset of COCO that matches PASCAL) to mask out anything that is not human. All that is left to do from here is to alpha-blend the extracted human overlay onto the slow-motion video.

^ your device may or may not support the format of this video above, use your imagination if it doesn’t

There are, however, some details one has to deal with when implementing this algorithm. For one, the segmentation result is not perfect (actually, pretty awful.) One way that happens to mitigate this issue is faking a long exposure from multiple frames (i.e. hyperlapse.) I didn’t spend too much time on this but got some decent results by simply applying a Gaussian filter and a convolution with a horizontal-line filter before blending ~100 frames together by pixel-wise maximum (similar to “lighten” in photoshop.) One could also use background subtraction on the masked region, cluster the pixels and filter out small ones, run Canny edge detection to get a sharper edge, etc. I didn’t really have time to debug those so I chose to use the segmentation results as is. There were also some other issues, such as dealing with the computational cost of Resnet inference on large frames (takes ~16G of vram to process a 3K video in FP16) and figuring out the right equation for the right blending mode but they are not worth discussing in detail.

Can post the code if anyone is interested…