miniverse – bad deepfake tennis

inspo:

https://courses.ideate.cmu.edu/60-461/f2022/hunan/11/03/hunan-personintime/

https://courses.ideate.cmu.edu/60-461/f2022/skrrr/11/08/lost-in-time/

what if I offset a segmentation map by a constant in time?

Find a location to best insert this method.

I was thinking tennis bc contact point w ball is important

plan: use yolov5 to segment ball out. It contains a “sports ball” category that is basically a high level circle detector

replace some of the code in yolo to spit out a segmentation mask I can then use:

^use mask to remove ball and inpaint video using cv2 inpaint

People missing their tennis forehands(?) backhands(?) 

final result! frame offset is 20.

 

 

 

 

 

 

frankenstein’s monster / roach – miniverse

For my project I wanted to do an interactive installation

My initial idea was an app on a large monitor

app would take real time footage of passerbys and crop their body part and assemble a single Frankenstein body from them. Ex: left leg from person A, right leg from person B

concept: have fun with friends in front of this exhibit

However, finding a real time frame to frame stable multi person limb segmenter ML model is difficult. There’s no way for me to really train one   these models are produced by like teams at google.

 So I pivoted to a slightly different concept:

Roachifier!

This is a one person interactive exhibit

Here’s me interacting with this, real time:

 

 

 

 

It roachifies the participant by segmenting out the arms and the legs and then sticking it onto the torso of a roach!

It rotates your limbs so that your thigh and upper arm can’t move, they are permanently affixed at angles becoming of a roach. Kafkaesque?

Why a roach?

This is a one person concept and I was running a little low on time

I wanted to maintain an element of play and bugs are funny.

Also when I was working with the original frankenstein’s monster concept the limbs were affixed at a certain angle in a certain location, very much like a pinned bug.

 

my code: https://github.com/himalinig/frankenstein

all the code is in test.py

I wrote most of it

 

proposal- person in Time

I want to do a live capture installation that takes the footage of people walking in front of a large screen

  1. segments their bodies into individual parts
  2. records the motion of that part over a second at a time
  3. plays that motion across time on the screen offset in time to sort of give a synchronized swimming effect

Colorism in Tamil Film

Proposal

I’m Tamil. India has a huge problem with colorism which disproportionately impacts women. As a result, I’ve seen many ridiculous Tamil movies where the female lead is the same skintone as a Northern European while the male lead looks like someone who is actually from a region with 90 degree weather.

Machine

I wanted to craft a machine to show this trend clearly. I wanted a typology of actors and actresses and their skintones. To do that, I first needed to construct a machine to somehow extract skin tone from film.

Method

This was broken down into multiple smaller steps:

  1. track down a bunch of tamil movies
  2. pirate them
  3. downsample footage by extracting frames with ffmpeg
    1. for i in *.mp4; do mkdir "${i/%.mp4/}"; ffmpeg -i "$i" -vf "select=not(mod(n\,30))" -vsync vfr -q:v 2 "${i/%.mp4//img_%03d.jpg}"; done
  4. use python to extract faces from each frame
    1. use face-recognition module to crop faces out automatically with a CNN and then extract a 128 dimensional embedding vector
    2. use UMAP to group the faces into people based on their embedding vector
  5. extract skin tone from each face and average across each person grouping
  6. construct a data visualization to go through the people from lightest to darkest average skintone

Result

I failed at step 5 and 6. It ended up being more complicated than I anticipated to reconstruct the average skin tone of people and then construct a compelling interactive out of it. As a result, I have a sort of intermediate result instead. In the google drive below is essentially a collection of per movie images. Each image is the clustered faces of the people in that film organized by a reduced version of their embedding vector. The distance and location between faces in this representation is not directly related to skintone, rather it is what the computer decides best represents faces in 2d. That could be skin tone in some ways but it’s less obvious than that.

Result

reflecting on the end result: Though the intermediate product lacks the conceptual backing of my original proposal, in its own right I think it’s a series of compelling images. I do find it pleasurable to sit and scroll across the images and observe some of the dumb faces actors and actresses make.

 

I’ve linked downloadable images here but I’ve also inserted them into the post. For optimal viewing experience I highly recommend downloading the images and opening them up so you can zoom in and scroll around.  https://drive.google.com/drive/folders/16FyGw2GLY4Svj_8U7fXJdTEZF_yQlH86?usp=sharing

 

Updated Result:

10/3: I’m updating this post as I get better visualizations of skintone. So I’ve included one mega image of all the people across multiple films organized by their average skin tone.  Below that are the intermediate results of per movie images. In the Mega image I do see all the darker faces at the top are primarily the male lead while at the bottom they are primarily the female lead.

10/4: I’ve added “Better Mega Image” so please ignore “Mega Image”. I also included a video below that showing me scrolling top to bottom zoomed into the grid. I think it demonstrates some of the trends in gender and skin tone I wanted to show. I highly recommend watching the scrolling video to understand that I want the viewer to zoom in and traverse the landscape of the grid.

 

Mega image

Better Mega Image

Scrolling the Better Mega Image

Per movie images

 

Typology proposal – miniverse

I’m going to graph colorism across the top 100 grossing tamil films by getting a skin tone match for both the male and female leads.

My method

    1. find top 100 grossing films
      1. most are on amazon prime
    2. run through those films quickly on 3x speed
    3. screen shot the first scene where the male and female lead are in frame together
      1. this will normalize the data across each male/female pair
    4. get a color card manually for each pair
    5. and then present the data and the original screen shots

miniverse- reading01

I think it’s interesting how dubious the scientific community was of photographs early on. Especially astronomers considering there was little venue for the general public to observe what they were seeing without a camera. There’s such a high overlap between the lens tech used for cameras and telescopes that I would think astronomers would be the most enthusiastic about using photographs. In some ways, the modern astronomer is also a photographer.

The practice of applying honey and sticky substances, having special development recipes, to old school plates breathes some artistry back into the medium. I’m too used to the click of a phone camera. I liked the photos of Venus and other planetary observations from multiple artists each using their own development recipe and photographic style. In some ways it more an objective documentation of each photographer’s quirks than an accurate record of planet.

I’m most excited about the use of the camera to scale time. Sped up videos of crowds, slowed down videos of objects crushed in a hydraulic press. It lets me experience different dimensions to a given object or experience. I also wrangle with its ability to warp distance and space. Whenever I pull out my iphone and the FOV, focal length, etc. make the camera different from what I see with my eyes it’s aggravating. I like that surprise especially when I use focal zoom on a camera and comb over the details in a scene.

miniverse – looking outwards

there are hundreds of thousands of tourist pictures of the notre dame. Each from a slightly different angle and location on the ground. CV experts at UW had the idea to use these free photos as photogrammetry inputs and estimate a 3d model of notre dame. This requires very intense knowledge of CV because each tourist photo needs to be mapped to a physical location around the building and then used to create a 3d model. This is like a much more complicated version of epipolar reconstruction. I think it’s interesting how this could be applied to virtually any location with enough footage or images. I took a class in cv, but I don’t really understand how enough 2d information can construct a 3d model with lots of math. Now that notre dame has burned down, processes like this allow retroactive 3d models of a a location that can no longer be captured.

https://www.washington.edu/news/2007/11/01/vacation-photos-create-3d-models-of-world-landmarks/