MediaPipe Machine Vision¶
New for 2025.
MediaPipe is an open-source package from Google providing high-level solutions for machine vision using machine learning. The library works in conjunction with OpenCV. The models were developed with phones in mind so they offer good real-time performance.
These preliminary notes are more of a placeholder than a full tutorial.
The sample code included here may be downloaded as mediapipe_examples.zip or browsed in Python/mediapipe_examples.
Installation Notes¶
We will only use the Python API for MediaPipe. Current setup instructions can be found at https://ai.google.dev/edge/mediapipe/solutions/setup_python
The simplest way to install it is using pip:
pip install mediapipe
On most platforms this will import efficient pre-compiled binaries and install all required package dependencies. If this does not work, the package may not be available for your particular Python version.
Note: As of September 2025, the newest supported Python version is 3.12, not the current 3.13, as shown on the PyPI page: https://pypi.org/project/mediapipe/
The specifics of this process may vary with your installation. Note that the recommended practice is to set up a virtual environment so the package and its dependencies can be kept locally, but that is outside the scope of these instructions.
MediaPipe Documentation¶
The top-level link for all the documention is currently https://ai.google.dev/edge/mediapipe/solutions/guide
Some specific entry points for our purposes can be found under Vision tasks: object detection, gesture recognition, hand landmark detection, face detection, pose landmark detection, and more.
The code repository is at https://github.com/google-ai-edge/mediapipe
Example: Face Detection¶
The following sample is adapted from the documentation at https://ai.google.dev/edge/mediapipe/solutions/vision/face_detector/python
It reads an image file, detects any faces, marks key points, and saves an annotated image.
1# mediapipe_examples/face_detection.py
2# adapted from https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/face_detector/python/face_detector.ipynb
3# model file: https://storage.googleapis.com/mediapipe-models/face_detector/blaze_face_short_range/float16/1/blaze_face_short_range.tflite
4
5import math
6import argparse
7
8import numpy as np
9import cv2 as cv
10
11import mediapipe as mp
12from mediapipe.tasks import python
13from mediapipe.tasks.python import vision
14
15if __name__ == "__main__":
16 parser = argparse.ArgumentParser(description = "Demo MediaPipe face detection on a static image.")
17 parser.add_argument('--verbose', action='store_true', help="enable even more detailed console output" )
18 parser.add_argument('-i', '--input', default="face.png", type=str, help="input image file name (default: %(default)s)")
19 parser.add_argument('out', default="detected.png", type=str, nargs='?', help="output image file name (default: %(default)s)")
20 args = parser.parse_args()
21
22 # Create an FaceDetector object.
23 base_options = python.BaseOptions(model_asset_path='blaze_face_short_range.tflite')
24 options = vision.FaceDetectorOptions(base_options=base_options)
25 detector = vision.FaceDetector.create_from_options(options)
26
27 # Load the input image.
28 image = mp.Image.create_from_file(args.input)
29
30 # Detect faces in the input image.
31 detection_result = detector.detect(image)
32
33 if args.verbose:
34 print(f"detection_result: {detection_result}")
35
36 # Process the detection result into an annotated image.
37 annotated = np.copy(image.numpy_view())
38 annotated = cv.cvtColor(annotated, cv.COLOR_RGB2BGR)
39 height, width, channels = annotated.shape
40
41 for detection in detection_result.detections:
42 for pt in detection.keypoints:
43 if args.verbose:
44 print("Processing keypoint:", pt)
45 keypoint_px = math.floor(pt.x * width), math.floor(pt.y * height)
46 color, thickness, radius = (0, 255, 0), 2, 2
47 cv.circle(annotated, keypoint_px, thickness, color, radius)
48
49 cv.imwrite(args.out, annotated)