GitHub - raniphore/telegram-bot: Async Telegram bot combining RAG (local embeddings + SQLite vector search) and GPT-4o-mini vision — supports Q&A from a knowledge base, image captioning, per-user history, query caching, and conversation summarisation.

Hybrid Telegram Bot — RAG + Vision

A lightweight GenAI Telegram bot that answers questions from a local knowledge base (RAG) and describes uploaded images (Vision), both powered by OpenAI GPT-4o-mini.

Features

Command Description
/ask <query> Retrieves relevant chunks from knowledge base and answers using GPT-4o-mini
/summarize Summarises your last 3 interactions with the bot
/help Shows usage instructions
Send a photo Generates a caption and 3 keyword tags using GPT-4o-mini vision

System Design

User (Telegram)
    │
    ▼
Bot Handlers (bot/handlers.py)
    ├── /ask ──► Embedder (sentence-transformers)
    │                │
    │                ▼
    │          SQLite + cosine similarity (data/vectors.db)
    │                │  top-k chunks
    │                ▼
    │          OpenAI GPT-4o-mini ──► Reply with answer + sources
    │
    ├── /summarize ──► SQLite (data/bot_state.db)
    │                       │  last 3 interactions
    │                       ▼
    │                 OpenAI GPT-4o-mini ──► Summary reply
    │
    └── Photo ──► OpenAI GPT-4o-mini (vision) ──► Caption + Tags

Tech Stack

Layer Tool
Bot framework python-telegram-bot v21 (async)
Embeddings sentence-transformersall-MiniLM-L6-v2 (local, CPU)
Vector store sqlite-vec (no external server)
LLM + Vision OpenAI gpt-4o-mini

Setup

1. Clone and create virtual environment

git clone <repo-url>
cd telegram-bot

python -m venv venv
source venv/bin/activate      # macOS/Linux
# venv\Scripts\activate       # Windows

pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Edit .env and fill in TELEGRAM_BOT_TOKEN and OPENAI_API_KEY

3. Ingest knowledge base

This reads all .md / .txt files from knowledge_base/, splits them into chunks, embeds them locally, and stores them in data/vectors.db.

4. Run the bot

Knowledge Base

Place .md or .txt documents in the knowledge_base/ directory. Re-run python -m rag.ingest after adding new documents.

Current documents:

  • faq.md — Bot FAQ
  • company_policy.md — HR & company policies
  • tech_tips.md — Tech troubleshooting tips
  • recipes.md — Quick recipes

Models Used

  • Embeddings: all-MiniLM-L6-v2 — 80MB, runs on CPU, fast inference
  • LLM + Vision: gpt-4o-mini — cost-effective, supports text and image input