I got face tracking working with Zoom!
It requires at least two computers: one with a camera that streams video through Zoom to the second, which processes the video and runs the face tracking software, then send the video back to the first.
Here’s what happens on the computer running the capture:
- The section of the screen with the speaker’s video from zoom is clipped through OBS.
- OBS creates a virtual cam with this section of the screen.
- openFrameworks uses the virtual camera stream from OBS as its input and runs a face tracking library to detect faces and draw over them.
- A second OBS instance captures the openFrameworks window and creates a second virtual camera, which is used as the input for Zoom.
This is what results for the person on the laptop:
There’s definitely some latency in the system, but it appears as if most of it is through Zoom and unavoidable. The virtual cameras and openFrameworks program have nearly no lag.
For multiple inputs, it becomes a bit more tricky. I tried running the face detection program on Zoom’s grid view of participants but found it to struggle to find more than one face at a time. The issue didn’t appear to be related to the size of the videos, as enlarging the capture area didn’t have an effect. I think it has something to do with the multiple “windows” with black bars between; the classifier likely wasn’t trained on video input with this format.
The work around I found was to create multiple OBS instances and virtual cameras, so each is recording just the section of screen with the participant’s video. Then in openFrameworks I run the face tracker on each of these video streams individually, creating multiple outputs. The limitation of this method is the number of virtual cameras that can be created; the OBS plugin currently only supports four, which means the game will be four players max.