Most of my documentation is on my website but I will try to summarize a bit of it here.
Radiance fields are a (somewhat) newer type of 3D representation. Gaussian splats and NeRFs are view-dependent representations of real life objects, meaning that the image rendered relies of what POV the camera is positioned in. This makes these radiance fields ideal for objects that are reflective, transparent, or translucent, A.K.A. dead things in jars.
For this final project, I mainly worked on coordinating with the Carnegie Museum of Natural History to capture their specimen and make new splats. This is my last attempt at using Postshot. I changed the way I captured the specimen, by using stagnant lighting and videoing around the target. However, moving forward, it would make the most sense to me to using the aruco targets to establish a ground truth and fix the results from the sfm algorithm. There are parts of the draco video and most apparent in other frogs video where you can see that the point clouds are misaligned between different angles. This seems to be a pretty consistent issue, likely due to various artifacts (or lack thereof) in the glass. To combat this, I will be using the code from the original paper and nerfstudio to generate the splats, to have more control over the sfm.
I’m pretty excited to see where this project continues to go. I’ve met so many interesting people (from PhD students to people who’ve worked at Pixar) via talking about this project at our final exhibition or at a poster session. Rest assured, at some point next semester, I will be making a website of pickled things.
Traditional 3D reconstruction methods, like photogrammetry, often struggle to render reflective and translucent surfaces such as glass, water, and liquids. These limitations are particularly evident in objects like jars, which uniquely combine reflective and translucent qualities while containing diverse contents, from pickles to wet specimens. Photogrammetry’s reliance on point clouds with polygon and texture meshes falls short in capturing these materials, leaving reflective and view-dependent surfaces poorly represented. Advancements like radiance fields and 3D Gaussian Splatting have revolutionized this space. Radiance fields, such as Neural Radiance Fields (NeRFs), use neural networks to generate realistic 3D representations of objects by synthesizing views from any arbitrary angle. NeRFs model view-dependent lighting effects, enabling them to capture intricate details like reflections that shift with the viewing angle. Their approach involves querying 5D coordinates—spatial location and viewing direction—to compute volume density and radiance, allowing for photorealistic novel views of complex scenes through differentiable volume rendering. Complementing NeRFs, 3D Gaussian Splatting uses gaussian “blobs” in a point cloud, enabling smooth transitions and accurate depictions of challenging materials. Together, these innovations provide an unprecedented ability to create detailed 3D models of objects like jars, faithfully capturing their reflective, translucent, and complex properties.
Further development:
After a brief conversation with a PhD student working with splats and drawing from my own experiences, I’ve concluded that this project will require further development. I plan to develop this further by writing scripts to properly initialize my point clouds (using GLOMAP/COLMAP) and nerfstudio (an open source radiance field training software). This movement from commercial software in beta to open source is to gain more control over how the point clouds are initialized. I’m also changing how I capture the training images. Previous methods confused the algorithm.
Observation of myself as a result: I fidget a lot. And I think as time went on, I got sleepy and I cared less and less about the camera (I didn’t realize this is how it looks when I nod off in class).
This project is a bit last minute, due to two main components for my original idea not working. One thing I kinda got working was URScript (thank you golan!) but to be quite honest, even that wasn’t working very well. I think there’s some math involved that makes looping in a half circle (in a very simple, maybe dumb manner) that doesn’t work due to division by zeros due to zero radius (idk what this even means). Even after debugging my script, the math didn’t work out and I ended up resorting to just making scripted motion just by moving the robot to the position I thought it needed to be at.
The original idea was to use the robot arm to make animated 3D objects, which proved to be way too large in scope for this to be completed. I spent the majority of the last 2-3 weeks working on developing a script to automate the creation of multiple Gaussian splats, a new-ish way to create 3D objects. This didn’t work (it’s still not working). Thus, speed project was enacted and this was made. I really struggled with URScript — it took me 1.5 hrs to debug why my script wasn’t compiling (apparently if statements also need to be ended with an ‘end’ in URScript). But even after getting to compile, there was something about the computed trajectory that resulted in the robot just not being happy with me.
some thoughts: I think this is just a little sloppy. If I were to redo it, I would make a good filming box for myself. There’s too much background clutter that was captured due to the angles the robot arm was filming from. I think idea-wise, this project still explores the concepts of surveillance and how we occupy space/exist differently with weird pieces of tech added to our environments. I will be continuing to explore these ideas, but I think as an entirely different project. This robot arm frustrates me.
setup for filming:
debugging process for original idea and modified idea:
TLDR: Using the UArm 5, capture photos to make an animated, interactive 3D splat of a mound of kinetic sand/play doh that is manipulated by a person or group of people.
The above video has the qualities of interactivity and animation that I’d like to achieve with this project.
Current workflow draft:
Connect camera to robot arm and laptop.
Write a Python script that moves the robot arm in a set path (recording the coordinates of the camera/end effector) that loops at set time interval. Every execution of the path results in ~200 photos (3 angles, photo taken every 6 degrees; 180 photos per new mound) that will then be turned into a splat.
Conduct first test animation/training data by pinching some sand/playdoh, collecting images for 5 splats. Write a Python Script to automatically run all 5 splats overnight.
Come back the next morning, check for failure. If no failure and I have 5 splats, (here’s the hard part) align all splats and create a viewer that would loop these “3D frames” and allow for audience to interact with the camera POV. Ways I think I can align each “3D frame” and have a viewer that plays all the frames
Unity?
Writing code to do this (idk how tho)
Ask for help from someone with more experience
If the above step successful, ask people to swing by the SoCI to participate in a group stop motion project. I’ll probably put up constraints as to what people can do to the mound, most likely restricting the change to a single-digit number of pinches, flattening, etc.
I’m am very veryvery open to ways to simplify this workflow. Basically, distilling this idea even further while preserving the core ideas of time flow in 3D work (aka a 4D result).
Slightly less open to changes in concept. I’m kinda set on attempting to figuring out how to use the robot and piecing together splats to make a stop motion animation, so process and result is kinda set. I’m a little unsure on if this concept of “people manipulating a mound” even fits this “person in time” theme, but I’m open to ideas/thoughts/opinions/concepts that aren’t too difficult.
edit: should I capture people’s nails? like shiny sparkly nail art?
I’m still in my 3DGS era, so I’m looking a lot at 3D methods of capturing people, things, etc.
I’m not gonna lie, didn’t understand anything more than the images on this ppt BUT I think the premise of being able to simultaneously record movement + texture + mesh is so cool, especially since capturing a texture with a mesh is time consuming in it and of itself (which makes recording 3D the environment around a singular object and all of its faces kinda computationally expensive)¹. I think I’m reminded of this library from Ready Player One, the Halliday Journals, where players can view a moment from Halliday’s life from multiple different angles, zooming in and out, etc.
This project seems to be using MoCap + 3D reconstruction + stop motion-esque principles to create a VR/AR experience for demos. (not really what I’m looking for but the pipeline is interesting)
3D Temporal Scanners:
This project on motion analysis uses photogrammetry with markers and something they’re calling 3D temporal scanner to analyze the gaits of adults. It reminds me of the Chronography of a walking man by Mary (c.1883), mainly because both projects are focusing on human movement and how data can be extrapolated from non-linear motion.
3DTS is lowkey what I’m interested in making (but maybe for 3dGS?). Didn’t really get satisfactory results from a Google search of 3D temporal scanners (wtf with wrong with google’s search engine nowadays, so many ads) so I turned to my best friend ChatGPT:
A 3D temporal scanner is a type of device that captures and analyzes changes over time in three-dimensional space. It can track the motion or deformation of objects and environments across multiple time points, essentially combining spatial and temporal data into a single model or dataset. These scanners are typically used in various fields like medicine, animation, architecture, and scientific research. – ChatGPT 4o
ChatGPT says that some 3dTS can do motion capture + reconstruction + texture capture over a period of time and space (dependent on system).
a very expensive 3d temporal scanner/capture system –> optitrack
Aydin Buyukta’s work is kinda also along the lines of changing the way we view the work in the same way 3D reconstruction can change how we experience things. It makes me kinda give a go with the drone, especially since there’s been quite a few projects in the PostShot Discord that talk about using drone footage for 3DGS.
Note¹: I’m actually not quite sure if this is true, but I’m drawing from my experience of doing the captures for 3DGS cause I must’ve spent like 1hr on capturing all angles of some of my jars 😭.
I will admit that I didn’t have any research question. I only wanted to play around with some cool tools. I wanted to use the robot and take pictures with it. I quickly simplified this idea, due to time constraints and temporarily ‘missing’ parts, and took up Golan’s recommendation to work with the 3D Gaussian Splatting program Leo set up on the Studio’s computer in combo with the robot arm.
This solves the “how” part of my nonexistent research question. Now, all I needed was a “what”. Perhaps it was just due to the sheer amount of them (and how they managed to invade the sanctuary called my humble abode), but I had/have a small fixation on lantern flies. Thus, the invasive lantern fly became subject 00.
So I tested 3DGS out on the tiny little invader (a perfectly intact corpse found just outside the Studio’s door), using very simple capture methods aka my phone. I took about 30-50 images of the bug and then threw it into the EasyGaussian Python script.
Hmmm. Results are… questionable.
Same for this one here…
This warranted for some changes in capturing technique. First, do research on how others are capturing images for 3DGS. See this website and this Discord post, see that they’re both using turntables and immediately think that you need to make a turntable. Ask Harrison if the Studio has a Lazy Susan/turntable, realize that we can’t find it, and let Harrison make a Lazy Susan out of a piece of wood and cardboard (thank you Harrison!). Tape a page of calibration targets onto said Lazy Susan, stab the lantern fly corpse, and start taking photos.
Still not great. Realize that your phone and the macro lens you borrowed isn’t cutting it anymore, borrow a Canon EOS R6 from Ideate and take (bad) photos with low apertures and high ISO. Do not realize that these settings aren’t ideal and proceed to take many photos of your lantern fly corpse.
Doom scroll on IG and find someone doing Gaussian splats. Feel the inclination to try out what they’re doing and use PostShot to train a 3DGS.
Compare the difference between PostShot and the original program. These renderings felt like the limits of what 3DGS could do with simple lantern flys. Therefore, we change the subject to something of greater difficulty: reflective things.
Ask to borrow dead things in jars from Professor Rich Pell, run to the Center for PostNatural History in the middle of a school day, run back to school, and start taking photos of said dead things in jars. Marvel at the dead thing in the jar.
Figure out that taking hundreds of photos takes too long, and start taking videos because photos take too long to take. Take the videos, run it through ffmpeg, splice out the frames and run the 3DGS on those frames.
I think the above three videos are the most successful examples of 3DGS in this project. They achieve the clarity I was hoping for and the allow you to view the object from multiple different angles.
The following videos are some recordings of interesting results and process.
Reflection:
I think this method of capturing objects only really yields itself to be presented in real time via a virtual reality/experience or video run-throughs of looking at the object in virtual 3D space. I will say that this gives the affordance of allowing you to look at the reflective object in multiple different POVs. I think during this process of capturing, I really enjoyed being able to view the snake and axolotl in different perspectives. In the museum setting, you’re only really able to view it from a couple perspectives, especially since these specimens are behind a glass door (due to their fragility). It would be kinda cool to be able to see various specimens up close and from various angles.
I think I had a couple of learning curves, with the camera, software, and preprocessing of input data. I made some mistakes with the aperture and ISO settings, leading to noisy data. Also could’ve speed up my workflow by going the video-to-frames route sooner.
I would like to pursue my initial ambitions of using the robot arm. First, it would probably regularize the frames in the video. Second, I’m guessing that the steadiness of preprogramed motion will help decrease motion blur, something I ended up capturing but was too lazy to get rid of in my input set. Third, lighting is still a bit of a challenge. I think 3DGS requires that the lighting relative to the object must stay constant the entire time. Overall, I think this workflow needs to be used on a large dataset and to create some sort of VR museum of dead things in jars.
To Do List:
Get better recordings/documentation of the splats -> I need to learn how to use the PostShot software for fly throughs.
House these splats in some VR for people to look at from multiple angles at your own leisure. And clean them up before uploading them to a website. Goal is to make a museum/library of dead things (in jars) -> see this.
Revisit failure case: I think it would be cool to paint these splats onto a canvas, sort of using the splat as a painting reference.
Automate some of the workflow further: parse through frames from video and remove unclear images, work with robot,
More dead things in jars! Pickles! Mice! More Snakes!
One of my current biggest motivations is how can industrial machines be used in an artistic setting and using a robot arm would achieve just that. For this project, I was thinking of recording a series of image (panoramas or long exposure photos) of the shadows in the studio, to examine how light works in such a space. I’m still working on getting the code to work with the simulation software I downloaded, as the robot arm hasn’t been installed quite yet.
For this exploration, I mainly just opened the app and used it. Not much thought went into what I was capturing but there were a few things I was curious about.
For Genius Scan, I was mainly curious about how the scans compare to the document scanner native to the iPhone (in the notes app). For the subject of just the landscape around me, it appears fairly similar.
For Slitscan, I wanted to see how motion would affect how the way the slits would build the final image. So, on my walk to campus today, I took a couple of pictures of the view outside the bus, the view of Tepper with trees and grass, and something that I’m not sure of. The first two are with the phone orientation staying still while the phone moves in a linear manner. The last one is moving the orientation of the phone while the phone also moves.
For Polycam/Scaniverse, I wanted to see how well it would capture an entire space. It kinda warps reality in a really funky way and I wonder how difficult of a challenge it is to capture 3D objects, because the results were quite interesting.
I think other than introducing a new way of looking at the world, one of the most interesting affordances of scientific approaches to imaging is the idea of collection and collection of images as evidence for scientific hypotheses. Ignoring the fact that the task of collection is an upcoming deliverable, the influence of scientific approaches (I feel) heavily influenced artists to imaging larger and larger volumes of specimen to exploit patterns found in real life in art. Take for example, Francis Galton’s Specimens of Composite Portraiture. The use of alternative imaging resulting in this incredibly not-fact-based-in-todays-standards-but-at-the-time-very-good collection of portraits. In some ways, it exposed the flaws of our initial understanding of scientific reasoning. In present context, I think having the opportunity to do large volumes of imaging is incredibly relevant (considering how other forms of data are being used). Who knows what patterns you can extract from a collection of images – how do details of one picture fade in the context of a million other images?