Kelly Wu – Experimental Capture

Final Delivery | Walk on the earth

Concept

I seek to pixelate a flat, two-dimensional image in TouchDesigner and imbue it with three-dimensional depth. My inquiry begins with a simple question: how can I breathe spatial life into a static photograph?

The answer lies in crafting a depth map—a blueprint of the image’s spatial structure. By assigning each pixel a Z-axis offset proportional to its distance from the viewer, I can orchestrate a visual symphony where pixels farther from the camera drift deeper into the frame, creating a dynamic and evocative illusion of dimensionality.

Outcome

Capture System

To align with my concept, I decided to capture a bird’s-eye view. This top-down perspective aligns with my vision, as it allows pixel movement to be restricted downward based on their distance from the camera. To achieve this, I used a 360° camera mounted on a selfie stick. On a sunny afternoon, I walked around my campus, holding the camera aloft. While the process drew some attention, it yielded the ideal footage for my project.

Challenges

Generating depth maps from 360° panoramic images proved to be a significant challenge. My initial plan was to use a stereo camera to capture left and right channel images, then apply OpenCV’s matrix algorithms to extract depth information from the stereo pair. However, when I fed the 360° panoramic images into OpenCV, the heavy distortion at the edges caused the computation to break down.

Moreover, using OpenCV to extract depth maps posed another inherent issue: the generated depth maps did not align perfectly with either the left or right channel color images, potentially causing inaccuracies in subsequent color-depth mapping in TouchDesigner.

Fortunately, I discovered a pre-trained AI model online Image Depth Map that could directly convert photos into depth maps and provided a JavaScript API. Since my source material was a video file, I developed the following workflow:

Extract frames from the video at 24 frames per second (fps).
Batch processes 3000 images through the Depth AI model to generate corresponding depth maps.
Reassemble the depth map sequence into a depth video at 24 fps.

This workflow enabled me to produce a depth video precisely aligned with the original color video.

Design

The next step was to integrate the depth video with the color video in TouchDesigner and enhance the sense of spatial motion along the Z-axis. I scaled both the original video and depth video to a resolution of 300×300. Using the depth map, I extracted the color channel values of each pixel, which represented the distance of each point from the camera. These values were mapped to the corresponding pixels in the color video, enabling them to move along the Z-axis. Pixels closer to the camera moved less, while those farther away moved more.

The interaction between particles and music is controlled in Real-Time

Observing how the 360° camera captured the Earth’s curvature, I had an idea: Could I make it so viewers could “touch” the Earth depicted in the video? To realize this, I integrated MediaPipe’s hand-tracking feature. In the final TouchDesigner setup, the inputs—audio stream, video stream, depth map stream, and real-time hand capture. The final result is an interactive “Earth” that moves to the rhythm of music. The interaction between particles and music is controlled in real time by the user’s beats.

Critical Thinking

Depth map generation was a key step in the entire project, thanks to the trained AI model that overcame the limitations of traditional computer vision methods.
I feel like the videos shot with the 360° camera are interesting in themselves, especially the selfie stick that formed a support that was always close to the lens in the frame, which was very realistic and accurately reflected in the depth map.
Although I considered using a drone to shoot a bird’s-eye view, the 360° camera allowed me to realize the interactive ideas in my design. Overall, the combination of tools and creativity provided inspiration for further artistic exploration.

Final Project | Walk on the earth

Concept

Capture System

Challenges

Extract frames from the video at 24 frames per second (fps).
Batch processes 3000 images through the Depth AI model to generate corresponding depth maps.
Reassemble the depth map sequence into a depth video at 24 fps.

This workflow enabled me to produce a depth video precisely aligned with the original color video.

Design

To enhance the visual experience, I incorporated dynamic effects synchronized with music rhythms. This created a striking spatial illusion. Observing how the 360° camera captured the Earth’s curvature, I had an idea: what if this could become an interactive medium? Could I make it so viewers could “touch” the Earth depicted in the video? To realize this, I integrated MediaPipe’s hand-tracking feature. In the final TouchDesigner setup, the inputs—audio stream, video stream, depth map stream, and real-time hand capture—are layered from top to bottom.

Outcome

The final result is an interactive “Earth” that moves to the rhythm of music. Users can interact with the virtual Earth through hand gestures, creating a dynamic and engaging experience.

Critical Thinking

Depth map generation was a key step in the entire project, thanks to the trained AI model that overcame the limitations of traditional computer vision methods.
I feel like the videos shot with the 360° camera are interesting in themselves, especially the selfie stick that formed a support that was always close to the lens in the frame, which was very realistic and accurately reflected in the depth map.
Although I considered using a drone to shoot a bird’s-eye view, the 360° camera allowed me to realize the interactive ideas in my design. Overall, the combination of tools and creativity provided inspiration for further artistic exploration.

Final Project Proposal

3D scanning technology can restore the 3D object better than flat photography, but they are all static. I want to use touch designer for this project because it can achieve a lot of dynamic interactions. I have tried various things, and two main sections gave me some inspiration. I decided to focus on them to do interactive digital character capture.

1. I can successfully scan the point cloud with a mobile app, restore it in touch designer, and after adding background music, the point cloud will change with different tracks

2. I found a pipeline on the Internet that uses teachmachine to control the model in touch designer. And tried to train my own model with teachamchine.

I have not yet successfully connected the first and second parts. I will spend some time to continue researching. I think the possible solutions are:
The effect I want to achieve in the end is that I made digital animations for each classmate’s avatar. Whoever walks to the computer/puts the photo in front of the computer, the avatar will appear and dance (if my technology can do it, maybe it can dance with the recording of the person himself!).

Person In Time – Beyond Boundary

Concept

A glass wall is at once an unseen divide and a bridge of layered depth, allowing two worlds to mingle and merge. This stems from the dual properties of glass—reflection and transmission. If equipped with a polarizing camera (an instrument that captures light from a specific direction), interesting effects can emerge. Through this project, I aim to capture the subtle interactions between people on opposite sides of the glass.

Inspiration

One sunny autumn afternoon, after swimming, I stood outside the glass wall of the pool and was struck by the scene on the glass. I captured this moment on my phone. The glass seemed to merge the indoor and outdoor spaces seamlessly. The transparency of the glass determines the visibility of indoor people, events, and scenes, while the reflectivity decides how much of the outdoor world is visible. Thus, the indoor and outdoor spaces interact along this boundary.

Capture System

The polarized camera allows me to select the light entering my lens, choosing between transmitted or reflected light from the glass.

I started with a DIY “polarized camera” (two Sony Alpha a6000 mirrorless digital cameras + two sheets of polarizing filter), creating a setup that allows free rotation of the polarized film to form a “colorized polarized camera.” Each time a door opens, the sound in the video toggles between the indoor and outdoor spaces (throughout the 4:27 video, this toggle happens 7 times, and the door of the space “switches” 7 times).

I placed a recording device inside to capture interior sounds (this device was even noticed by the girl at the beginning of the video). Outside, I recorded video at the same spot using my DIY polarized camera, aligning the exterior visuals and sounds. To synchronize the interior and exterior sounds, I tapped the glass while inside—this tapping, cut from the final video, enabled me to match the audio from the recording and video devices during post-production.

Next, I captured another piece using the Blackfly S BFS-U3-51S5P monochrome polarized camera. This involved (1) positioning the camera so that reflections of the road on the glass aligned with the pool lanes, (2) recording people on both sides of the glass, (3) exporting the video, and (4) using a GitHub package to separate images from different angles, selecting the two with the highest contrast.

Spatial Illusions

“Aligning the road and the pool lanes” by design created an illusion: it looked as though people were swimming on the road, or as if those in swimsuits were walking on it. When a person appeared in the center of the image, it became hard to tell if they were coming from the pool or the forest. Over time, I believe you’ll develop your own way of distinguishing between indoor and outdoor figures. Here’s mine: (1) From left to right, do they transition from clear to blurry, or vice versa? (2) Are they wearing a swimsuit? (Shh!)

Later, I conducted an experiment: I tinted the black pixels in the indoor image blue, and the outdoor black pixels red. Then, I linearly combined the overlapping pixels, dynamically blending people and activities in both spaces over time through computer rendering.

Tips (If you’d like to try capturing people on glass using a polarized filter, here are a few insights):

Angle: Direct reflections can be challenging to capture. The closer the camera angle is to the glass, the better the outdoor reflection image.
Time of Day: Midday, with the sun directly overhead, is optimal for capturing both indoor and outdoor scenes. In the afternoon, the bright outdoor light can overpower indoor details, while in the early morning and evening, indoor light dominates, and reflections weaken.

Challenges

Despite bringing plenty of backup batteries, the two Sony cameras ran out of power at different rates, resulting in videos with slightly different lengths.
Weather! Passing clouds affected the current image quality. Clear sunlight and clean glass were essential, as was keeping the polarized film spotless.
People were unpredictable. Sometimes I’d arrive with my cameras and tripod, only to find no one swimming. Other times, swimmers would be there, but no one would pass on the street. Occasionally, someone would walk through the pool, but no one swam! This required patience and frequent visits to the glass wall.

Reflection

Compared to my DIY color polarized camera with audio, the professional polarizer produced much clearer images (likely due to differences in filter quality). But my version added color and sound, which brought their own value.
Surprisingly, capturing black-and-white images on glass actually enhanced the spatial illusion, as losing color information removes one of the clues distinguishing the two spaces.
Glass is fascinating—it’s an invisible boundary through which people see each other, yet it’s also thick enough to host stories on both sides, allowing both to unfold simultaneously.

Here’s a shot of me working with the Blackfly S BFS-U3-51S5P camera!

PersonInTimeWIP

Topic1: Connection – Digital Mirror in Time

A website acting as a mirror as well as a temporal bridge between us. A real-time website, tentatively named ” Digital Mirror in Time”, aims to explore shared human expressions and interactions over time using advanced AI technologies like Google AI Edge API, MediaPipe Tasks, and Face Landmark Detection. The project transforms your computer’s camera into a digital mirror, capturing and storing facial data and expression points in real time.

Inspiration: Sharing Face, Visiting the installation at either location would match your expression and pose in real time with these photos of someone else who once stood in front of the installation. Thousands of people visited the work, and saw themselves reflected in the face of another person.

Plan A: Sharing Smiling via Face Landmarker API api doc , demo reference

QA: can I get exactly the 478 landmarks results, currently I experimented and only can get face_blendshapes as output.

Facial Data Capture:
The website uses the computer’s camera to detect a user’s face, leveraging the MediaPipe Face Landmark Detection model. This model identifies key facial landmarks (such as eyes, nose, mouth, etc.), storing this data along with the corresponding positions on the screen.
Expression Storage (potential: Firebase)
The user’s facial expressions are stored in a premade database, including information like facial positions, angles, and specific expressions (smiling, frowning, etc.). This creates a digital archive of faces and expressions over time.
Facial Expression Matching and Dynamic Interaction (potential: Next.j + Prisma)

When a new user visits the website, their live camera feed is processed in the same way, and the system searches the database for expressions that match the current facial landmarks. When a match is found, the historical expression is retrieved and displayed on the screen, overlaying in the exact position.
This creates an interactive experience where users not only see their own reflection but also discover others’ expressions from different times, creating a temporal bridge between users. The website acts as a shared space where facial expressions transcend individual moments.

Pan B: Hand in Hand via HandLandmarker API + Sensel Morph/ Thermal Camera, api doc, demo reference

Concept: We often perceive the images of people on our screens as cold and devoid of warmth. This project explores whether we can simulate the sensation of touch between people through a screen by combining visual input and haptic feedback.

Using a Hand Landmarker API, the system recognizes and tracks the back of the user’s hand in front of a camera.The user places their palm on a Sensel Morph (or a similar device) that captures pressure data, creating a heatmap of the touch.
The pressure data is then stored in a database, linked to the visual representation of the hand. Implement algorithms to match the hands of future users with those previously recorded, based on hand shape and position.
When another user places their hand in the same position on the screen, the system matches their hand’s position and visual similarity to the previous user. Display the hand pressure heatmap on the screen when a matching hand is detected, simulating the sensation of touch visually.

Topic2: Broken and Distorted Portrait

What I find interesting about this theme is the distorted portrait. I thought of combining it with sound. When adding water to a cup, the refractive index will change, and the portrait will also change. At the same time, the sound of hitting the container can be recorded.

Looking Outwards 04

01A Out-of-Body Experience, by Tobias Gremmler, Adam Zeke

The name is a fascinating visualization of an ethereal concept, where viewers experience a sense of seeing their body from a detached perspective. The project utilizes a combination of Kinect and Oculus to create a mesmerizing point-cloud rendering, which blurs the lines between the physical self and the virtual representation. I appreciate it for its technology to manifest an intangible experience like an OBE, which invites the audience into a deeply personal exploration of presence and perception, understanding reality from an external viewpoint.

01B Virtual Actors in Chinese Opera, by Tobias Gremmler

Created for a theater production that fuses Chinese Opera with New Media, the virtual actors are inspired by shapes, colors and motions of traditional Chinese costumes and dance. The project made me think of how costumes and fashion could reshape a human body.

I like its concept that blending traditional art forms with cutting-edge technology, which is fascinating in the context of temporal capture, particularly immortalizes fleeting, live performances in a digital space. This form of capture moves beyond merely recording an event, allowing the audience to explore nuance. For example, it explores how traditional Chinese opera costumes and gestures, when captured digitally, become abstract patterns of motion, revealing the spiritual essence. I feel like the virtual actor, rooted in tradition, becomes a new entity through the lens of reinterprets.

02 The Johnny Cash Project, by Chris Milk

I believe the ability to see individual brush strokes come to life adds a captivating dimension to the viewing experience. It invites deeper engagement, allowing fans to feel a personal connection to the artwork and to Cash himself. This crowd-sourced homage beautifully encapsulates the essence of community and shared experience in art, making it a poignant tribute to a musical icon.

Gif of The Johnny Cash Project, in which more than 250,000 people individually drew frames for “Ain’t No Grave” to make a crowdsourced music video.

03 Human Blur Series – Penang Blur, Sven Pfrommer

“This mixed media collection is a series based on photographs I took while traveling PENANG / MALAYSIA in 2015. Back in my studio I added painting and mixed media techniques and finalized the work on acrylic, metal, resin coated wood panel or canvas. All works are limited edition of 10.”

I’m impressed by the blurring effect, which evokes a sense of transience, aligning with the idea of capturing people in time—fleeting moments that cannot be grasped in full detail. Instead, I experience the layered complexity of movement, where people are represented as part of a flowing system rather than discrete subjects. This also mirrors the way memory often works: impressions of people are sometimes remembered as hazy or fleeting. Lastly, this abstraction of human figures eliminates individuality, allowing the viewer to focus on the essence of motion, light, and shadow rather than on personal identity.

TypologyMachin: Bubble FishEye Selfie

Machine: Bubble Fisheye Camera & Typology: Selfie with Rich Context

The Machine will create portraits that are extracted from de-warped photographs of soap bubbles, which includes steps of: (1) Create Bubbles (2) Capture Bubbles (3) De-fishing Bubbles.

Bubbles are natural fisheye cameras that capture the colorful world around them. If we take a photo of a bubble with a camera, we will accidentally capture ourselves in the bubble. This gave rise to the idea of making a bubble capture machine, and the later interactive design of the project was influenced by portal picture 360 painting . I think fisheye cameras compress the rich content of the real world, and the process of viewing its image is a possibility of exploring changes from every parts.

I researched the formula for making giant bubble solution: it is

If you want to replicate giant bubbles, there are two tips: 1. The prepared solution must be left to stand for 24 hours. The longer the better the bubbles. 2. Compared with iron wire, cotton and linen ropes can be dipped into more solutions.

In order to verify the technical feasibility of defishing, we used a real mirror sphere for experiment. We tried and compared the Photoshop technology flow and the processing technology flow. Among them, Photoshop Adaptive Wide Angle: (1) Edge pixels are lost. (2) Customizable parameter adjustment capability is poor. Processing Programming: (1) Can obtain specific data from fisheye cameras (2) Customized adjustment parameters (3) More interactive when coding on one’s own. Therefore, I chose processing. The general idea is to extract the output photo pixels and map them back to 2D fisheye pixels from 3D space through triangular change mathematical calculations. Thanks to golan for figuring out the underlying principle with me. We tried different mathematical formulas and finally found the right one!!

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://courses.ideate.cmu.edu/60-461/f2024/wp-content/uploads/2024/10/output.mp4?_=1

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Let me talk about the technical points of this pipeline: – Bubble Photograph setting: (1)Autofocus (2)Continuous shooting mode(3)High-speed shutter. – Programmable adjustable parameters: (1)thetaMax (2)Center of circle + Radius, which I drawn in the canvas dynamically. – De-fish Criteria: Refer to the original curves and straight lines in the real world. – Explore Interaction:(1)De-fisheye for different focus points based on the location of the user mouse (2)Print parameters.

260+photos (12 bubbles are round, 7 bubbles are rich with contexts)

Some objective challenges and solutions. A. Whether：I prepared the bubble water and waited for 24 hours to start one week in advance, however, it rains every day in Pittsburgh. If the wind is strong, the bubbles will burst, not to mention the rainy day, and to capture the reflection of the bubble, the requirement for sunlight is very high. So, I can only wait for the weather to be sunny. B. Photograph of Moving Bubbles：The first time I took a group of photos of bubbles, I found that the pixels of the photo were enough, but the bubbles would be blurred when I focused on them because they were always flying. Later, I used the continuous shooting mode, autofocus, and high-speed shutter, and finally take clear photos of the moving bubbles. C. Bubble Shape：Fisheye restoration hopes that the bubbles are round, but the bubbles are affected by gravity, the surface solution is uneven, and coupled with wind disturbances, round bubbles are really valuable. D. Unwarp Fisheye Image：Fisheye Mathematic and Processing. Thanks to Golan & Leo!!! 🙂

Surprise! The fisheye image captured by the bubble machine actually has two shadows of me. This is because the lower inverted image is from the bubble’s rear surface acting as a large concave mirror. However, the upper upright image is from the bubble’s front acting as a convex mirror!!

Some Findings: I.Round bubbles are better capture machines! II.Dark environments are better for capturing bubble images! III.Don’t blow bubbles too big, they will be deformed! From my perspective,

“Bubbles =Fisheye Camera + Prism Rainbow Filter for Selfie With Rich Context”

Firstly, The surface of the bubble can contain up to 150 unbalanced film layers, reflecting different colors of sunlight.Secondly, I like this capture machine because of its randomness: shooting angle, weather, bubble size and thickness will bring unexpected surprises to the final selfie!

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://courses.ideate.cmu.edu/60-461/f2024/wp-content/uploads/2024/10/output_final_2m.mp4?_=2

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://courses.ideate.cmu.edu/60-461/f2024/wp-content/uploads/2024/10/output333.mp4?_=3

00:00

Use Up/Down Arrow keys to increase or decrease volume.

slides link

TypologyMachineWIP

Topic1: Paper Chromatography Experiment – inks are almost always made up of multiple colors which spread at different rates. Technique: Time-lapse + PlayBack

Plan 1-A: Through Paper Chromatography, we see that the final color spectrum is color separation and contour separation. Restore the single color code through flashback.

step1 Write the answer on paper

step2 Perform Paper Chromatography and record the whole process with a camera

step3 Video production using time-lapse photography and flashback

Plan 1-B: Color reconstruction of plants. There are no two identical leaves in the world. We think that all leaves in the world are the same green. Chromatography is used to analyze the color structure of different leaves to determine the uniqueness of leaves in the world.

step1 Leaf type selection

step2 Grind into powder and filter the leaves

step3 The whole process of Chromatography of leaf juice in time-lapse photography

* Kitchen filter paper can be used to make experiments, but the colored markers I picked didn’t have great color separation, so I need to do more experimentation.

Topic2 Bubble Photography. The features of bubbles that attract me are as follows: (1) The infinite possibilities of color and combination under light (2) The instantaneous existence and easy to break (3) Fisheye imaging.

I have a series of possibilities I want to explore, you can choose one to start exploring:

Plan2-A: “Capturing the disappearance of bubbles”. Capturing the moment of popping bubbles in different ways. Technology: Using Edgertronics high-speed cameras.

Plan2-B: Bubbles produced by some special liquids, such as oil-water mixture.

Plan2-C: Bubbles and smoke.

FFMpeg Test – View Two Paintings

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://courses.ideate.cmu.edu/60-461/f2024/wp-content/uploads/2024/09/test.mp4?_=4

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Step1 To concatenate the two videos together, and exports the resulting video, with a (resized) width of 240 pixels and moderately heavy compression:

ffmpeg -i tile.MOV -i silver.MOV -filter_complex “[0:v]scale=240:-2[v0];[1:v]scale=240:-2[v1];[v0][0:a][v1][1:a]concat=n=2:v=1:a=1[v][a]” -map “[v]” -map “[a]” -c:v libx264 -crf 28 -preset medium test.MOV

Step2 To convert a .mov file to .mp4 because workpress only preview mp4:

ffmpeg -i test.mov -vcodec h264 -acodec aac test.mp4

Would cat’s tail swing when he was asleep?

我意识到我的猫睡觉时尾巴会摆动。所以我开始思考如何捕捉它。

首先，我家的猫咪在窗台上睡觉，我走近一看，它的眼睛几乎睁不开。今天我的拍摄对象是它的尾巴。下面这张图，就展示一下它那几乎是短腿的它，以及它那摇来摇去的尾巴。

我延时拍摄

我想知道我家猫咪睡觉时一分钟内尾巴摇了多少次。我拿起手机录了一分钟的视频。为了更快得到统计结果，我使用了缩时录像：

使用 capcut 软件，我将整个视频放大到 8 倍，我可以清楚地感觉到尾巴在摆动，甚至波动。但我的猫一直在睡觉（至少它假装在睡觉 🙂

在这一分钟里，他的尾巴摇了30多次，大约每秒0.5次。

II 慢动作

我想看猫咪摇尾巴的细节，所以我拍了一段0.2秒的猫咪剧烈摇尾巴的片段。我把镜头放慢了两次，每次慢0.1秒，这样我就能看到猫咪摇尾巴时每一根毛的方向。真的很酷。

III 狭缝扫描

如果把移动的碎片定格，再连在一起，会是什么样的感觉？我用silt-scan软件探索了这种可能性。静止的时候，图片左边的线是他的爪子，因为静止所以是直线，右边的线是他尾巴的轨迹。

因为抖动，线条时而急转，时而断落，时而轻盈，时而拥挤，你可以想象，如果一支画笔在移动的纸张上留下痕迹，大概就是这样的。

四、基于反射技术和捕获对象

猫咪是一个运动的物体，相比这三种基于手机的扩展捕捉技术，Timelapse 可以长时间记录所有内容，而 Slow Mo 和 Slitscan 可以捕捉肉眼不易察觉的细节，而这些细节往往非常有趣。两者相比，Slowmo 反映的内容更加真实，Slitscan 则有一定的创意在里面，最终 Slitscan 得到的是基于现实的再创作。

我喜欢silt-scan带来的不确定性，通过光学成像，猫的尾巴仿佛是一把画笔，画笔自由摇曳，在我的手机上留下即兴的一笔，而这一切都是他无意识的创作。