GitHub - ShayneP/offline-ai-transcriber: An agent that transcribes everything it hears then processes the result with AI, without needing the internet.

Project: Long-Term Voice Recorder with AI Processing

This project is an open-source, modular system designed to continuously capture audio, transcribe it using a local speech-to-text engine, store the transcripts, and provide a web interface for browsing and future AI-driven analysis.

System Architecture

The system is 3 different components that work together:

graph TD
    A[Microphone] --> LA[LiveKit Agent]
    LA -- Audio Stream --> VB[VoxBox Transcription API]
    VB -- Transcribed Text --> LA
    LA -- Stores Transcript --> DB[SQLite Database]
    FS[Flask Web Server] -- Reads Transcripts --> DB
    FS -- Serves UI --> UserBrowser[User's Browser]
    FS -- Summarization --> OLLAMA[Ollama]


    subgraph Audio_Input_Processing
        A
        LA
        VB
    end

    subgraph Data_Storage_Access
        DB
        FS
    end

    subgraph User_Interface
        UserBrowser
    end

    subgraph AI_Analysis
        OLLAMA
    end

    style LA fill:#f9f,stroke:#333,stroke-width:2px
    style VB fill:#ccf,stroke:#333,stroke-width:2px
    style DB fill:#cfc,stroke:#333,stroke-width:2px
    style FS fill:#ffc,stroke:#333,stroke-width:2px
    style OLLAMA fill:#fcc,stroke:#333,stroke-width:2px
Loading

1. LiveKit Agent (Audio Input + Transcript Management)

  • File: agent.py
  • Functionality:
    • Captures live audio from the environment using a LiveKit Agent.
    • Utilizes Silero VAD for voice activity detection.
    • Streams audio to the local VoxBox service for transcription.
    • Receives transcribed text from VoxBox.
    • Saves finalized transcripts to the SQLite database.
    • Manages session information in the database to keep transcripts separate.

2. VoxBox (Local Transcription API)

  • Directory: vox-box/
  • Functionality:
    • A separate Python service (installed via pip install vox-box).
    • Hosts a local instance of the Systran/faster-whisper-small model (via Hugging Face).
    • Exposes an OpenAI-compatible API endpoint (typically http://localhost:5002/v1) that the LiveKit Agent sends audio to.
    • Returns transcribed text to the agent.
    • Requires its own virtual environment due to dependency conflicts.

3. SQLite Database (Persistent Data Store)

  • Default File: voice_transcripts.db
  • Schema & ORM: app/database/models.py (SQLAlchemy models: User, Session, Transcript)
  • Functionality:
    • Stores all user information, recording sessions, and transcription results.
    • Uses UUIDs as primary keys for all tables.
    • Includes support for metadata such as confidence scores, timestamps, language, and custom JSON fields.
    • Managed via SQLAlchemy ORM, with helper functions in app/database/helpers.py and app/database/__init__.py.

4. Flask Web Server (UI & API)

  • File: app/web_app.py (run via run_webapp.py or directly)
  • Functionality:
    • Provides a web interface (HTML templates in app/templates/) to:
      • Browse all saved recording sessions (/).
      • View detailed transcripts for a specific session (/session/<session_id>).
      • Copy/paste transcript text.
    • Includes an analysis page (/analyze/<session_id>) that:
      • Retrieves the combined transcript text for a session.
      • Checks if a summary already exists for the session in the Session.summary database field.
      • If a cached summary exists, it's displayed directly.
      • If no cached summary is found, it uses a local Ollama instance (via the openai library) to generate a new summary. This new summary is then saved to the Session.summary field in the database for future requests and displayed.
    • Exposes an API endpoint (/api/transcripts/<session_id>) to fetch transcripts for a session in JSON format.

5. Local LLM (Ollama Integration)

  • Service: Uses a locally running Ollama instance.
  • Connection: The Flask Web Server connects to Ollama using the openai Python client.
    • Default Ollama API Base URL: http://localhost:11434/v1 (configurable via OLLAMA_BASE_URL).
    • API Key: Required by the openai client, can be any string for local Ollama (configurable via OLLAMA_API_KEY, defaults to "ollama").
  • Functionality:
    • Provides text summarization for transcripts on the /analyze page.
    • Caching: Generated summaries are cached in the Session.summary field in the database. If a summary for a session is requested, the cached version is used if available; otherwise, a new summary is generated by Ollama and then stored for subsequent requests. This reduces redundant LLM processing.
    • Default Model: gemma3:4b (configurable via OLLAMA_MODEL).
    • The system is designed to be extensible for other AI-driven insights in the future.

6. Scripts & Management

  • scripts/manage_db.py:
    • A command-line utility for database operations:
      • create: Initializes the database schema (uses app/database/create_tables.py).
      • reset: Drops all existing tables and recreates the schema.
      • seed: Populates the database with sample data for testing.
  • app/database/create_tables.py:
    • Contains the core logic to create database tables based on SQLAlchemy models.
  • Environment Configuration:
    • Uses a .env file (see .env.example) for settings like DATABASE_URL, OLLAMA_BASE_URL, OLLAMA_API_KEY, and OLLAMA_MODEL.

Setup and Installation

Prerequisites

  • Python 3.10+
  • SQLite (comes with Python)
  • Ollama installed and running (see Ollama website). Ensure the desired model (e.g., gemma3:4b) is pulled: ollama pull gemma3:4b.
  • Separate virtual environments for the main application and VoxBox are recommended.

1. Main Application Setup

# Clone the repository (if you haven't already)
# git clone <repository-url>
# cd <repository-directory>

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies (includes 'openai' for Ollama)
pip install -r requirements.txt

# Copy and configure environment variables
cp .env.example .env
# Edit .env to set DATABASE_URL (default is sqlite:///voice_transcripts.db)
# and Ollama settings if needed:
# OLLAMA_BASE_URL=http://localhost:11434/v1
# OLLAMA_API_KEY=ollama
# OLLAMA_MODEL=gemma3:4b

# Create database tables
python scripts/manage_db.py create

# Optionally seed the database
python scripts/manage_db.py seed

2. VoxBox (Transcription Service) Setup

cd vox-box

# Create and activate a separate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install VoxBox
pip install -r requirements.txt # This installs the 'vox-box' package

cd .. # Return to project root

Running the Application

  1. Start Ollama: Ensure your Ollama service is running. If you installed Ollama as a desktop application, it might start automatically. Otherwise, you may need to start it via the command line:

    Ensure the model you intend to use (e.g., gemma3:4b) is available: ollama list. If not, pull it: ollama pull gemma3:4b. Keep this service running.

  2. Start VoxBox: Open a new terminal, navigate to the vox-box directory, activate its virtual environment, and run:

    # In vox-box/ directory, with vox-box venv active
    vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 5002

    Keep this service running.

  3. Start the LiveKit Agent: In another terminal (at the project root, with the main application's venv active):

  4. Start the Flask Web Server: In yet another terminal (at the project root, with the main application's venv active):

    Or for development mode:

    flask --app app.web_app run --debug --port 5000

    Access the UI at http://localhost:5000.


Notes & Key Directories

  • agent.py: Core logic for audio capture and initial transcript handling.
  • vox-box/: Contains setup for the local transcription server.
  • app/database/: SQLAlchemy models, database initialization, and helper functions.
  • app/web_app.py: Flask application, routes, and UI logic.
  • app/templates/: HTML templates for the web interface.
  • scripts/: Database management scripts.