Sarah Yun – Experimental Capture

Final Project — Fragmented Moments

Concept

At the beginning of the semester, I was more interested in the capturing system or more of the aesthetic of the final output. In these final projects, I grew more interested to the subject of what I was capturing.

During the Person-in-Time project, I wanted to capture the “moments” when I am with my college friends and represent it in a form that is memorable for me to reflect back on later.

My first iteration of this project was called, “Cooking in Concert”, where I recorded us cooking food together. Having lived with each other for two years and cooking countless meals together, my roommates and I naturally formulated cooking stations and drills. Through this project, I wanted to capture our working patterns and interaction within such a small closed space in an abstract representation.

After this project, I realized I wanted to integrate more spatial elements in the representation. Although I personally preferred this abstract visualization because I wanted to focus on our moving patterns and interactions, I received feedback that it felt too distant from my intention of capturing a wholesome moment with friends.

Therefore in my second iteration, I wanted to create a capture that closely replicates my memory of the moment. After our critique, I thought more about asking the question of what brings people together. And does this reason shape how you interact with each other?

Outcome

(I was absent during the actual event, so there is no other documentation.)

I really liked this point-cloud visual because it only had points facing the kinect camera. From the back, it looks like empty shells, which goes inline with how the light from the tv only lights up the front side of our bodies. I also liked how fragmented and shattered it looks because it represents how my memories also bound to come apart and have missing pieces here and there. In this iteration my focus was not necessarily on the movement but to capture the essence of the moment to observe us when we are all observing something else. To take note of some details or interaction that I do not remember because my memory is only a puzzle piece of the whole picture. This project got me to think about how people who share the moment have different perspectives of that same moment.

Capture “system”

With the Azure Kinect camera I spatially captured my interaction with my friends when we regularly meet every Saturday 7pm at our friend’s apartment to watch k-dramas together. The setting is very particular because we watch in the living room on the tv we hooked up the computer to and all four of us sit in a row on the futon while snacking.

I also set up a supplementary go-pro camera to time lapse our screening, in case the Kinect recording failed. Thankfully, it did not. Lastly, I also recorded our conversations during the screening on my computer. Then, as I explain later on, I use runwayml to transcribe the audio recording and motion tracking to integrate the dialogue component to the capture.

Challenges +Updates from last post

I was able to build the application for transformation_example with command lines on terminal, but even after running it, it did not properly output anything …

Therefore, I tried to use Touch Designer instead. Touch Designer worked great with live data, as my demo shows.

However, it can not read the depth data in the mkv files. From searching up similar cases to mine, only the Kinect Viewer can interpret the depth and RGB data so I would need to export it as a .ply before bringing it into Touch Designer or other programs.

I have tried other suggested methods like mkvtoolnix, etc, but they did not seem to work great either.

Therefore, I decided to focus more on integrating the dialogues into the scenes. I used runwayml to transcribe the audio to get the rough timeline.

Then, I edited parts that were interpreted incorrectly. Lastly, I used its motion tracking to attach the speech to each person.

Again, Final Product

Final Project – Lingering Memories

WHAT

Previously…

Person In Time_ Cooking in Concert

I want to capture these memorable last moments with my college friends in a form where I can visit later on in the years when I want to reminisce.

I decided to document a representative “group activity” which is how me and my roommates move around and work together in our small kitchen by setting up a GoPro camera and observing our moving patterns.

I then recreated a 3d version of scene so that we can view the movements more immersively. I wanted to work with depth data in this next iteration of the project, so I used a Kinect this time.

Azure Kinect

Design and validation of depth camera-based static posture assessment system: iScience

Kinect is a device designed for computer vision that has depth sensing, RGB camera, and spatial audio capabilities.

My Demo

WHO

After my first iteration capture of a moment when we cook together, I wanted to choose another moment of interaction. With the Kinect I wanted to spatially capture my interaction with my friends when we regularly meet every Saturday 7pm at our friend’s apartment to watch k-dramas together. The setting is very particular because we watch in the living room on the tv we hooked up the computer to and all four of us sit in a row on the futon.

I was especially drawn to this 3D view of the capture and want to bring it into Unity so I could add additional properties like words from our conversation and who is addressing who and so on.

HOW

Now comes my struggle…:’)

I had recorded the capture in a file called mkv, which is a format that includes both depth and color data. In order to bring it into Unity to visualize this I would need to transform each frame of data as a ply, or point clouds.

I used this Point Cloud Player tool by keijiro to display the ply files in Unity. And I managed to get the example scene working with the given files.

However, I faced a lot of trouble converting the mkv recording into a folder of ply files. Initially, it just looked like this splash of points when I opened it in Blender.

After bringing it into MeshLab and playing with the colors and angles, I do see some form of face. However, the points weirdly collapse in the middle like we are being sucked out of space.

Nevertheless I still brought it into Unity, but the points are very faint and I could not quite tell if the points above are correctly displayed below.

Next Steps…

Find alternative methods to convert to ply files
1. Try to fix my current python code
2. Or, try this transformation_example on the Github (I am stuck trying to build the project using Visual Studio, so that I can actually run it)
Bring it into Unity
Add text objects for the conversations

WIP Final Project

For the final project I want to do a spin off from my last project. Essentially, I want to capture these memorable last moments with my college friends. With three of my friends we regularly meet every Saturday 7pm at our friend’s apartment to watch k-dramas together. We each bring snacks to eat while screening whether it is something we cooked or baked or trader joe sweets. There is a tv in the living room that we hooked up the computer to and all four of us sit in a row on the futon. I wanted to capture this core memory in a form that I can remember and be immersed in when I look back at this work years later.

This time instead of just using a go-pro camera I wanted to capture the depth and this works perfectly with the kinect camera because we all face in one direction watching the screen.

I first just set up the work space on the windows computer. I have the kinect SDK downloaded and did some test trials working with its software.

I think another important aspect of this capture is not only the visual but also our conversations because we would be talking about a totally random thing while keeping our eyes locked on the screen. Sometimes we have two different conversation going on at the same time and it gets very chaotic but its the only time that I put down all the load and enjoy the moment.

Person In Time_ Cooking in Concert

My roommates and I have been living together for about 2 and a half years now. In the beginning, it was kind of like living with siblings for me, since I’m an only child and not used to sharing a living space. But over time, we’ve sort of just fallen into this unspoken routine in the kitchen – we each have our own designated spots for prepping ingredients, watching the stove, and cleaning up as we cook.

I wanted to document how we move around and work together in our small kitchen, so I set up a GoPro camera in the doorway to get a full view of the space.

But instead of just showing the actual footage, which would include my roommates, I decided to reinterpret it in 3D using Unity. That way I could focus just on the patterns of movement without exposing anyone’s face.

3D scan of the kitchen with Reality Scan in Unity

Final output

Interesting Observations

In the beginning, we’re all moving around, getting ingredients out of the fridge and trying to find our spots. But after maybe 10 minutes or so, we’ve each kinda claimed our own little stations – one of us watching the stove, one chopping stuff up, etc.

The interesting variable is that our third roommate was pretty busy with work this day, so she joined the cooking a bit later (around 14 min mark). And we had another friend (who doesn’t live with us) to help out in the kitchen. Since that friend isn’t used to our normal kitchen routine, you can see him kind of drifting around in the middle, not really sure where to go or what to do.

But by the end, he’s found his spot over by the sink, doing the dishes, while I’m keeping an eye on the stove and my other roommate is adding in the ingredients. It was interesting to see how we all fall into our roles, even with the new person joining in. This sort of natural collaborative pattern is what I wanted to capture with this abstract 3D interpretation.

Imagery inspo

Struggles

Converting the 2d video of multiple subjects moving quickly in a small space into 3d Unity coordinates was the most difficult problem.

I first tried to use the depth map of the video to calculate the distance of the subject from the camera but I found that Unity can read the depth information of each pixel on the recording, which is not so helpful since all of us are moving around switching places all the time.

Then I tried to use the motion tracking, which worked a lot better. My logic was that if I figure out how Unity tracks where the red square was moving in the screen (placed in Unity the same spot the actual camera was positioned) I can extend that position into a vector perpendicular to the screen and create another plane parallel to the ground. If I find the intersection point of the vector and the plane that should be the position of our subject.

However, I got stuck at trying to make Unity track the red square. The code that I had simply did not work :’) so I decided to use the more manual approach of recording each key timestamps and its respective position in Unity in a table. I converted the table into JSON file the Unity script can use to change the positions.

Next step: converting the 360 recording to VR video

Nebby Neighbour

I sleep in the living room so I hear everything that is going on in the house… especially in the kitchen. If someone is making food, running the microwave, filling up their Brita water, etc. I can assume what they are doing in the kitchen by listening on the sound.

Goal of the project

Detect the sound coming from the kitchen and machine recognizes the activity and sends me a notification by logging my roommates activity while I am outside the house.

How

Teachable Machines -> Audio
- record audio for training data
Record in real time
- test data set?
Send me emails of the activity

Test: microwave

Questions

How can it recognize which of my roommate is in the kitchen (1 or 2)?
1. train sound track of the foot step
Do you think the form of notification matters?
1. Right now we are thinking emails because it seems the most feasible but do you think it is important that it is like a push notification….

Typology Machine_Take2: The Shoe Box

After our final critique of the Typology machine project, I was contemplating how I could have delivered the concept in a more efficient and direct way. Some of the feedback mentioned how the 3d captures were not as clear and the details that I wanted to show were not present. Therefore, I wanted to attempt the project again in a different light. (no pun intended…)

I have a lot of shoes at home, around 10. Some are more worn out than others. Some I wear more often than the special-occasion ones. I simply wanted to photograph them with different lights. Original image, image with only diffused light, image with only specular light and image with UV light.

I was more considerate of the presentation of the typology this time. I initially tried to mask out the shoe but I decided to keep it with the background. I wanted a sense of environment that would be absent if I were to mask it out of context.

Paper Dice | Kids' Crafts | Fun Craft Ideas | FirstPalette.com

For the layout of the images, I took inspiration from paper dice templates–how cardboard boxes look when taken apart. I thought it would be interesting because the concept is that the shoes were photographed inside the shoe box.

Typology

When you first click on the pages it takes some loading time but without any action the images will slowly change from one state to another.

What I find interesting about this set of typology is that for the leather shoes (shoe2) it is almost pitch black with no details for the diffused light and uv light images but for the specular it almost replicates the figure of the original image. Through the changes we can see how the material completely bounces off the light. In comparison to the slippers (shoe1), which you can hardly make out the lines of the shoe in the specular light image.

Looking Outwards #04

01. 3D Volumetric Capture

They used 4 volumetric time-of-flight cameras that captured extreme three dimensional detail in sync. These cameras were exposed with a thousandths of a second delay from each other, 30 times a second, to not to disrupt the exposure from opposite cameras. I really like how this looks either like a ghost house or a doll house because we are able to see through the walls and the people inside the building moving.

02. Sympoietic Bodies

This film explores the anthropocentric and a post-anthropocentric point of view of our relationship between our human body and physical surroundings. In order to create this effect they used a digital camera, motion capturing, photogrammetry, point cloud scan, Kinect camera. It almost seems like we are peering into a very microscopic world and I was really interested in how there is constant slow movement with the piece.

Another project that I thought was analogous to this theme was Kamil Czapiga’s works. Kamil Czapiga uses magnetic fluids to mimic the sort of movements of a microorganism. He also composes sound effects that matches these videos. Although these two projects are very different aesthetic and pacing styles, both works feel like they are uncovering a world we don’t normally see in a microscopic level.

03. Volumetric Selfie

I could not find an external video link for this work, but this project is also working with volumetric cameras. It also reminded me of one of the depth cameras/ and the touchpad we have in the lab because not only is the facial feature disintegrating they only emerge once the person has entered a certain parameter.

Photogrammetry of UV Coloration with Flowers (ft. Polarizer)

Initially, I took interest in the polarizer sheets, which I can divide the diffused light and the specular light with. I was particularly interested in the visual of image with only the diffused light.

(Test run image: Left (original), Right (diffused only))

My original pipeline was to extract materials from real life object and bring it into a 3D space with the correct diffused and specular maps.

After our first crit class, I received many feedbacks on my subject choice. I could not think of one that I was particularly interested in, but after seeking advice, I grew interested in capturing flowers and how some of them display different patterns when absorbing UV light. Therefore, I wanted to capture and display the UV coloration of different flowers that we normally do not see.

I struggled mostly with finding the “correct” flower. Other problems that came with my subject choice were that flowers wither quickly, they are very fragile and quite small.

(Flower I found while scavenging the neighborhood with the bullseye pattern, but it withered soon after.)

After trying different programs to conduct photogrammetry, RealityScan worked the best for me. I also attached a small piece of polarizer sheet in front of my camera because I wanted the diffused image for the model; there was not a significant difference since I couldn’t use a point light for the photogrammetry.

Here is the collection:

Daisy mum

(Normal)

(Diffused UV)

Hemerocallis

(Normal)

(Diffused UV)

(The bullseye pattern is more visible with the UV camera)

Dried Nerifolia

(Normal)

(Diffused UV)

My next challenge was to merge two objects with different topology and UV maps so that it has two materials on one model. Long story short, I learned that it is not possible…:’)

Some methods I tried on Blender were

Join two objects as one bring the two UV maps together than swap them
Transfer Mesh Data + Copy UV Map
Link Materials

They all resulted in a broken material like so…

The closest to the result was this, which is not terrible.

Left is the original model with original material; middle is the original model with UV material “successfully” applied and Right is the UV model with UV material. However, the material still looked broken, so I thought it was best to keep the models separated.

3D Object Material Capture

I took interest in this study done on splitting specular and diffuse with real images and wanted to incorporate it into my typology machine.

Goal

As the diagram above shows, I want to create a system where I have a collection of photos and a model of the object, and it outputs 3d capture of the object with its respective material properly mapped.

Due to my lack of knowledge in ML, I feel like it would be difficult to expect the system to do the mapping on its own. As in, I initially want the system to distinguish from the photos 1. the object is composed of how many different parts and 2. which material property matches with which respective parts. However, this process might need to be done manually.

Target Object

What I really wanted to explore in this study is how can I capture an object composed of different components. Most real objects are made up of different parts with different material properties. Using the scissor example, the blade will have a higher specular value than the handle because it is more shiny and bounces off more specular light.

Therefore, for my subject of capture I wanted to work with my lotion bottles, which has a mix of different material types.

Jeju Snowglobe

I plan on doing this activity again, but I first tried to document my tiny snow globe by my window. Its height is the size of my thumb, so it was difficult to get a crisp image of it as a close up photo. I first shook the globe and wanted to capture the movements of the snowflakes using time-lapse and slowmo.

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://courses.ideate.cmu.edu/60-461/f2024/wp-content/uploads/2024/09/Untitled-design-6.mp4?_=1

00:00

Use Up/Down Arrow keys to increase or decrease volume.

(time-lapse)

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://courses.ideate.cmu.edu/60-461/f2024/wp-content/uploads/2024/09/Untitled-design-9.mp4?_=2

00:00

Use Up/Down Arrow keys to increase or decrease volume.

(slow-mo)

I initially thought that slow motion would have a better image but I realize as I was shaking the snow globe that the way the flakes move is not that fast. Therefore, it felt more excruciatingly slow to do a slow motion.

Lastly, I did a 3d scan of the globe using Polycam.

Most details of the globe was well captured in the model but it had trouble understanding the “glass” and reflexive surfaces.