MediaPipe Machine Vision

New for 2025.

MediaPipe is an open-source package from Google providing high-level solutions for machine vision using machine learning. The library works in conjunction with OpenCV. The models were developed with phones in mind so they offer good real-time performance.

These preliminary notes are more of a placeholder than a full tutorial.

The sample code included here may be downloaded as mediapipe_examples.zip or browsed in Python/mediapipe_examples.

Installation Notes

We will only use the Python API for MediaPipe. Current setup instructions can be found at https://ai.google.dev/edge/mediapipe/solutions/setup_python

The simplest way to install it is using pip:

pip install mediapipe

On most platforms this will import efficient pre-compiled binaries and install all required package dependencies. If this does not work, the package may not be available for your particular Python version.

Note: As of September 2025, the newest supported Python version is 3.12, not the current 3.13, as shown on the PyPI page: https://pypi.org/project/mediapipe/

The specifics of this process may vary with your installation. Note that the recommended practice is to set up a virtual environment so the package and its dependencies can be kept locally, but that is outside the scope of these instructions.

MediaPipe Documentation

The top-level link for all the documention is currently https://ai.google.dev/edge/mediapipe/solutions/guide

Some specific entry points for our purposes can be found under Vision tasks: object detection, gesture recognition, hand landmark detection, face detection, pose landmark detection, and more.

The code repository is at https://github.com/google-ai-edge/mediapipe

Example: Face Detection

The following sample is adapted from the documentation at https://ai.google.dev/edge/mediapipe/solutions/vision/face_detector/python

It reads an image file, detects any faces, marks key points, and saves an annotated image.

 1# mediapipe_examples/face_detection.py
 2# adapted from https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/face_detector/python/face_detector.ipynb
 3# model file: https://storage.googleapis.com/mediapipe-models/face_detector/blaze_face_short_range/float16/1/blaze_face_short_range.tflite
 4
 5import math
 6import argparse
 7
 8import numpy as np
 9import cv2 as cv
10
11import mediapipe as mp
12from mediapipe.tasks import python
13from mediapipe.tasks.python import vision
14
15if __name__ == "__main__":
16    parser = argparse.ArgumentParser(description = "Demo MediaPipe face detection on a static image.")
17    parser.add_argument('--verbose', action='store_true', help="enable even more detailed console output" )
18    parser.add_argument('-i', '--input', default="face.png", type=str, help="input image file name (default: %(default)s)")
19    parser.add_argument('out', default="detected.png", type=str, nargs='?', help="output image file name (default: %(default)s)")
20    args = parser.parse_args()
21
22    # Create an FaceDetector object.
23    base_options = python.BaseOptions(model_asset_path='blaze_face_short_range.tflite')
24    options = vision.FaceDetectorOptions(base_options=base_options)
25    detector = vision.FaceDetector.create_from_options(options)
26
27    # Load the input image.
28    image = mp.Image.create_from_file(args.input)
29
30    # Detect faces in the input image.
31    detection_result = detector.detect(image)
32
33    if args.verbose:
34        print(f"detection_result: {detection_result}")
35
36    # Process the detection result into an annotated image.
37    annotated = np.copy(image.numpy_view())
38    annotated = cv.cvtColor(annotated, cv.COLOR_RGB2BGR)
39    height, width, channels = annotated.shape
40
41    for detection in detection_result.detections:
42        for pt in detection.keypoints:
43            if args.verbose:
44                print("Processing keypoint:", pt)
45            keypoint_px = math.floor(pt.x * width), math.floor(pt.y * height)
46            color, thickness, radius = (0, 255, 0), 2, 2
47            cv.circle(annotated, keypoint_px, thickness, color, radius)
48
49    cv.imwrite(args.out, annotated)