Snapback — On-device Vision Prototype (OpenCV + MediaPipe + VLM)
A small prototype that runs a real-time desk camera loop and overlays user state.
- Capture/overlay: OpenCV
- On-device inference: MediaPipe FaceLandmarker (Tasks API)
- Coaching layer (optional): Ollama +
qwen2.5vl:3b(VLM)
This repo is used as a concrete “on-device vision pipeline” artifact (capture → inference → overlay → latency measurement).
What it does
- Grabs webcam frames
- Runs face landmark detection (and derives simple features like EAR)
- Tracks a coarse state (focused/distracted/etc.)
- (Optional) calls a VLM periodically and shows a short coaching message
Setup
This project uses uv (recommended).
# inside the repo
uv syncIf you don’t use uv, you can still install from pyproject.toml using your preferred tool.
Note: On first run,
detector.pydownloadsface_landmarker.taskfrom the official MediaPipe model bucket.
Run
Press q to quit.
Benchmark (latency)
Quick end-to-end-ish benchmark for the FaceLandmarker inference step:
uv run python bench_mediapipe.py --frames 200
It prints average / p50 / p95 milliseconds per frame.
Files
main.py— OpenCV loop + overlay + gluedetector.py— MediaPipe FaceLandmarker (Tasks API) + EAR/head signalsbench_mediapipe.py— simple latency benchmark runnervlm_coach.py,vlm.py— VLM layer (Ollama + qwen2.5vl)
Notes
- This is a prototype. It is OK to describe it as a prototype / exploration on a resume.
- If you need a sharable artifact: add
- README run steps (already)
- one screenshot under
assets/ - benchmark output (this repo includes a runner)