GitHub - wilmoore/OmniTranscripts: YouTube video transcription API with Go, yt-dlp, FFmpeg, and whisper.cpp. Features async job processing, live dashboard, and Encore framework integration.

OmniTranscripts is a self-hostable transcription engine that turns any local or remote audio/video into clean, timestamped transcripts via a Go library or HTTP API. OmniTranscripts exists because most transcription tools are either SaaS-only, YouTube-only, or not designed to fit real automation workflows.

Supported inputs

Local audio/video files (.mp4, .mp3, .wav)
Multipart file uploads
Direct media URLs
YouTube (including Shorts)
TikTok videos
Instagram Reels (public)
1000+ additional platforms via yt-dlp

Built in Go. Powered by FFmpeg and Whisper, with a single, deterministic pipeline. Designed for production pipelines.

Multi-Source Ingestion: Local files, file uploads, direct URLs, and 1000+ platforms
Single Pipeline: Same FFmpeg → Whisper flow regardless of source
Go Library + HTTP API: Embed or deploy
Sync + Async Processing: Short jobs return immediately, long jobs queue
Multiple Outputs: TXT, SRT, VTT, JSON, TSV
Production-Oriented: Size limits, validation, structured errors, webhooks
Self-Hostable: Docker, Encore.dev, any cloud

Documentation

Quick Start

Get up and running in minutes.

Read guide

API Docs

Complete reference with request & response examples.

Explore

Architecture

Deep dive into the system design and patterns.

Understand

Deployment

Production-ready deployment playbooks.

Deploy

Development

Local setup, workflows, and contributor tooling.

Build

Troubleshooting

Quick fixes for common pitfalls and errors.

Fix

Contributing

Guidelines for issues, pull requests, and reviews.

Join

Changelog

Track version history and notable updates.

Review

Quick Start

Docker (Recommended)

# Run with Docker (with local media mount)
docker run -d \
  --name omnitranscripts \
  -p 3000:3000 \
  -e API_KEY=your-api-key-here \
  -v $(pwd)/media:/media \
  wilmoore/omnitranscripts:latest

# Transcribe a URL
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/shorts/VIDEO_ID"}'

# Upload a local file
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer your-api-key-here" \
  -F "file=@./media/video.mp4"

Local Development

# 1. Install dependencies
# macOS
brew install go ffmpeg
pip install openai-whisper

# Ubuntu/Debian
sudo apt install golang-go ffmpeg python3-pip
pip install openai-whisper

# 2. Clone and run
git clone https://github.com/wilmoore/omnitranscripts.git
cd omnitranscripts
cp .env.example .env  # Edit with your settings
make dev

Encore.dev (Production)

# Deploy to production in one command
curl -L https://encore.dev/install.sh | bash
encore deploy --env production

Usage

As a Go Library

import "omnitranscripts/engine"

// Local file
result, err := engine.Transcribe(
    "/path/to/recording.mp4",
    "job-local-001",
    engine.DefaultOptions(),
)

// URL (YouTube, Vimeo, SoundCloud, direct media URLs, etc.)
result, err := engine.Transcribe(
    "https://example.com/audio.mp3",
    "job-url-002",
    engine.DefaultOptions(),
)

// Handle errors
if err != nil {
    var tErr *engine.TranscriptionError
    if errors.As(err, &tErr) {
        fmt.Printf("Failed at stage %s: %s\n", tErr.Stage, tErr.Message)
    }
    return err
}

fmt.Println(result.Transcript)
for _, seg := range result.Segments {
    fmt.Printf("[%0.1fs - %0.1fs] %s\n", seg.Start, seg.End, seg.Text)
}

Local files bypass the download stage and go directly through FFmpeg → Whisper.

As an HTTP API

`POST /transcribe`

Transcribe media from any supported URL.

Request:

{
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Response (Short Media):

{
  "transcript": "Never gonna give you up, never gonna let you down...",
  "segments": [
    {
      "start": 0.0,
      "end": 3.5,
      "text": "Never gonna give you up"
    }
  ]
}

Response (Long Media):

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000"
}

`GET /transcribe/{job_id}`

Get the status and result of a transcription job.

Response:

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "complete",
  "transcript": "Never gonna give you up...",
  "segments": [...],
  "created_at": "2024-01-01T12:00:00Z",
  "completed_at": "2024-01-01T12:02:30Z"
}

Possible statuses: pending, running, complete, error

`GET /health`

Health check endpoint (no authentication required).

Complete API documentation: docs/api.md

Usage Examples

For real-world usage patterns (short-form video, Docker, async jobs), see the examples/ directory.

cURL

# Transcribe from URL
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'

# Transcribe a local file (file upload)
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/recording.mp4"

# Transcribe a podcast URL
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/podcast-episode.mp3"}'

# Check job status
curl -X GET http://localhost:3000/transcribe/YOUR_JOB_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

JavaScript

const response = await fetch('http://localhost:3000/transcribe', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
  })
});

const result = await response.json();

Short-Form Video

OmniTranscripts works with short-form video platforms out of the box.

YouTube Shorts

curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/shorts/VIDEO_ID"}'

TikTok

curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.tiktok.com/@username/video/VIDEO_ID"}'

Instagram Reels (public)

curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.instagram.com/reel/REEL_ID/"}'

Note: Only public content is supported. Private or authentication-gated content requires additional configuration not covered here.

More Examples

See the examples/ directory for real-world usage:

local-files/ - Local file transcription (Go library + file upload)
short-form/ - YouTube Shorts, TikTok, Instagram Reels
docker/ - Docker Compose and container workflows
production/ - Async job polling and webhooks

Full API reference: docs/api.md

Platform Limitations

These are upstream platform constraints, not OmniTranscripts-specific limitations.

Limitation	Details
Private content	Friends-only Instagram Reels, private TikToks, unlisted YouTube videos requiring auth
Authentication	No built-in support for authenticated sessions (cookies, login)
Region locks	Some content is geographically restricted
Rate limits	Platforms may throttle requests; add delays for batch processing
Platform changes	yt-dlp extractors may break when platforms update; keep yt-dlp updated

For most public content, OmniTranscripts works reliably. Edge cases should be tested before production use.

Development Setup

Prerequisites

Go 1.23+ - Install Go
FFmpeg - brew install ffmpeg (macOS) or sudo apt install ffmpeg (Linux)
OpenAI Whisper - pip install openai-whisper

Quick Setup

# Clone and setup
git clone https://github.com/wilmoore/omnitranscripts.git
cd omnitranscripts
cp .env.example .env  # Edit with your settings
make dev

Detailed development guide: docs/development.md

Make Commands

make help          # Show all available commands
make dev           # Development server with hot reload
make test          # Run all tests
make build         # Build for current platform
make check         # Run quality checks (fmt + lint + vet + test)

Complete command reference: docs/development.md#using-the-makefile

Production Deployment

Docker (Recommended)

Docker is the recommended way to run OmniTranscripts in production.

# Build and run
docker build -t omnitranscripts .
docker run -d \
  -p 3000:3000 \
  --env-file .env \
  -v $(pwd)/media:/media \
  omnitranscripts

# Or use Docker Compose (see examples/docker/)
docker-compose -f examples/docker/docker-compose.yml up -d

Encore.dev (Zero-Config)

encore deploy --env production

Cloud Platforms

AWS ECS/Fargate - Container-based deployment
Google Cloud Run - Serverless containers
Azure Container Instances - Managed containers
Kubernetes - Full orchestration

Complete deployment guides: docs/deployment.md

Architecture

Three-Stage Pipeline:

Download - Extract audio from any URL (yt-dlp, supports 1000+ platforms)
Normalize - Convert to 16kHz mono WAV (FFmpeg)
Transcribe - Generate timestamped transcripts (whisper.cpp)

Smart Processing:

Media <=2min: Synchronous (immediate results)
Media >2min: Asynchronous (job queue with status tracking)

Dual Consumption Model:

engine/ package: Direct Go library usage
HTTP API: Thin adapter over the engine

Detailed architecture: docs/architecture.md

API Documentation

Full OpenAPI/Swagger documentation available at docs/swagger.yaml.

Troubleshooting

Common issues and solutions: docs/troubleshooting.md

Contributing

We welcome contributions! Please see our contributing guide for details on:

Setting up your development environment
Coding standards and best practices
Submitting issues and pull requests
Community guidelines

License

This project is licensed under the MIT License - see the LICENSE file for details.