Exercise: Machine Vision¶

New for 2025.

It has become clear that most of our ideas for interactive projects involve significant sensing of human activity. One of the richest but trickiest sensors we have is machine vision. This exercise will provide some individual experience working with machine vision system to solve project tasks.

We currently have two major approaches available:

high-level interpretation using commercial machine learning models: slow, abstract, programmed using text prompting
low-level analysis using OpenCV: fast, numeric, programmed in Python using OpenCV matrix operations

In light of the highly varied programming experience among the students, I am offering either approach for this exercise, although I’d like to see some samples of each.

Clarification: you are welcome to seek out other libraries and sample code to implement the machine vision solution as long as you document your use. Please identify your sources and document the changes you made to adapt them for your problem.

Learning Objectives¶

After this exercise, you should be able to:

Identify a vision-based sensing application relevant to interactive robotics
Configure a measurement environment conducive to repeatable analysis
Collect reference images or video simulating the interaction conditions
Iteratively develop a machine vision analysis program producing meaningful assessments

Process¶

Success with computer vision has much in common with success in photography. The field of view, lighting, and background must be chosen deliberately to produce consistent images which capture the human behaviors of interest.

Please choose and articulate an interaction scenario, either based on your project proof-of-concept or any other interaction situation we’ve discussed.
Please consider the physical layout of the scenario and arrange lighting and camera viewpoint with candidate interaction behaviors in mind.
Please capture a selection of still images and video representing all possible interaction states, including null and empty scenes. You may wish to recruit a subject.
Please choose between the two toolkits and develop a script which generates useful metric output for an input image which would provide useful feedback for the interaction system.

Some possible outputs include:
- binary predicate (yes/no) on a particular human presence or pose
- discrete categorization of scene state
- numeric measure of human or object location
- reference set of key scene colors
Please note that general text description (e.g. from Ollama) is unlikely to constitute useful feedback.
Please carefully test your script on the reference set and document the overall error rates.

Ollama¶

Please see the Ollama Generative LLM page for an introduction to using a local image analysis machine learning model.

Please note that any system we deploy will need to use models running locally to enforce privacy. However, you are free to use online systems during the exploration phase if you wish, only you will need to port your prompt results to the local system and test them for real-world performance.

The final result should be a Python script which can run a local model for a sample image. This can later be connected to live camera data.

OpenCV¶

Please seee the OpenCV Machine Vision page for an introduction to using OpenCV for real-time low-level image processing.

There are a wealth of online tutorials and resources for OpenCV, and more may be published to the course site.

The final result should be a Python script which can process a sample image. This can later be connected to live camera data.

Deliverables¶

Please post a brief report to the appropriate shared folder. Please include the following:

brief synopsis of the interaction scenario
selection of reference images and/or video
the final Python script
table of computed outputs, including error estimates and timing results. Please note that by computed outputs I mean the specific Python data which would be used to drive the system behavior.
processed output image and/or video, as appropriate

Note: please be mindful of the Generative Artificial Intelligence policy; you are welcome to use such tools but you are obligated to detail your use.