MediaPipe Machine Vision ¶

New for 2025.

MediaPipe is an open-source package from Google providing high-level solutions for machine vision using machine learning. The library works in conjunction with OpenCV. The models were developed with phones in mind so they offer good real-time performance.

These preliminary notes are more of a placeholder than a full tutorial.

The sample code included here may be downloaded as mediapipe_examples.zip or browsed in Python/mediapipe_examples.

Installation Notes ¶

We will only use the Python API for MediaPipe. Current setup instructions can be found at https://ai.google.dev/edge/mediapipe/solutions/setup_python

The simplest way to install it is using pip:

pip install mediapipe

On most platforms this will import efficient pre-compiled binaries and install all required package dependencies. If this does not work, the package may not be available for your particular Python version.

Note: As of September 2025, the newest supported Python version is 3.12, not the current 3.13, as shown on the PyPI page: https://pypi.org/project/mediapipe/

The specifics of this process may vary with your installation. Note that the recommended practice is to set up a virtual environment so the package and its dependencies can be kept locally, but that is outside the scope of these instructions.

MediaPipe Documentation ¶

The top-level link for all the documention is currently https://ai.google.dev/edge/mediapipe/solutions/guide

Some specific entry points for our purposes can be found under Vision tasks: object detection, gesture recognition, hand landmark detection, face detection, pose landmark detection, and more.

The code repository is at https://github.com/google-ai-edge/mediapipe

Example: Face Detection ¶

The following sample is adapted from the documentation at https://ai.google.dev/edge/mediapipe/solutions/vision/face_detector/python

It reads an image file, detects any faces, marks key points, and saves an annotated image.

# mediapipe_examples/face_detection.py
# adapted from https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/face_detector/python/face_detector.ipynb
# model file: https://storage.googleapis.com/mediapipe-models/face_detector/blaze_face_short_range/float16/1/blaze_face_short_range.tflite

import math
import argparse

import numpy as np
import cv2 as cv

import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description = "Demo MediaPipe face detection on a static image.")
    parser.add_argument('--verbose', action='store_true', help="enable even more detailed console output" )
    parser.add_argument('-i', '--input', default="face.png", type=str, help="input image file name (default: %(default)s)")
    parser.add_argument('out', default="detected.png", type=str, nargs='?', help="output image file name (default: %(default)s)")
    args = parser.parse_args()

    # Create an FaceDetector object.
    base_options = python.BaseOptions(model_asset_path='blaze_face_short_range.tflite')
    options = vision.FaceDetectorOptions(base_options=base_options)
    detector = vision.FaceDetector.create_from_options(options)

    # Load the input image.
    image = mp.Image.create_from_file(args.input)

    # Detect faces in the input image.
    detection_result = detector.detect(image)

    if args.verbose:
        print(f"detection_result: {detection_result}")

    # Process the detection result into an annotated image.
    annotated = np.copy(image.numpy_view())
    annotated = cv.cvtColor(annotated, cv.COLOR_RGB2BGR)
    height, width, channels = annotated.shape

    for detection in detection_result.detections:
        for pt in detection.keypoints:
            if args.verbose:
                print("Processing keypoint:", pt)
            keypoint_px = math.floor(pt.x * width), math.floor(pt.y * height)
            color, thickness, radius = (0, 255, 0), 2, 2
            cv.circle(annotated, keypoint_px, thickness, color, radius)

    cv.imwrite(args.out, annotated)

MediaPipe Machine Vision¶

Installation Notes¶

MediaPipe Documentation¶

Example: Face Detection¶

MediaPipe Machine Vision ¶

Installation Notes ¶

MediaPipe Documentation ¶

Example: Face Detection ¶