RAG-Local/python_example/README.md at main ยท codenamejason/RAG-Local

Python Local RAG System

100% Local โ€ข Zero Cost โ€ข Complete Privacy

A pure Python implementation of Retrieval-Augmented Generation (RAG) that runs entirely on your machine. No API keys, no cloud services, no data leaving your computer.

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.9+
  • 4GB+ RAM (8GB recommended)
  • Windows, macOS, or Linux

One-Line Setup

Windows (PowerShell):

.\scripts\windows\setup_complete.ps1

macOS/Linux:

./scripts/unix/setup_complete.sh

This will:

  1. Install Ollama (local LLM server)
  2. Download TinyLlama model (1.1B parameters, runs on any machine)
  3. Download Nomic embedding model
  4. Set up Python environment
  5. Install all dependencies

Manual Setup

  1. Install Ollama:

    • Download from ollama.ai
    • Or use scripts: .\scripts\windows\install_ollama_windows.ps1
  2. Pull Models:

    ollama pull tinyllama
    ollama pull nomic-embed-text
  3. Install Dependencies:

    pip install -r requirements.txt

๐Ÿ’ป Usage

Interactive CLI

Python Script

from src.rag_pipeline_local import LocalRAGPipeline

# Initialize
rag = LocalRAGPipeline()

# Add documents
rag.add_documents([
    "Python is a versatile programming language.",
    "RAG combines retrieval and generation."
])

# Query
response = rag.query("What is Python?")
print(response['answer'])

Jupyter Notebook

jupyter notebook rag_example.ipynb

๐Ÿ“ Project Structure

python_example/
โ”œโ”€โ”€ src/                    # Core RAG implementation
โ”‚   โ”œโ”€โ”€ rag_pipeline_local.py  # Main pipeline
โ”‚   โ”œโ”€โ”€ llm_local.py           # Ollama LLM wrapper
โ”‚   โ”œโ”€โ”€ embeddings_local.py    # Local embeddings
โ”‚   โ”œโ”€โ”€ vector_store_lancedb.py # Vector storage
โ”‚   โ”œโ”€โ”€ chunking.py            # Text chunking
โ”‚   โ””โ”€โ”€ cli.py                 # Interactive CLI
โ”œโ”€โ”€ scripts/               # Setup & utility scripts
โ”‚   โ”œโ”€โ”€ windows/          # Windows scripts
โ”‚   โ””โ”€โ”€ unix/            # macOS/Linux scripts
โ”œโ”€โ”€ tests/                # Test suite
โ”œโ”€โ”€ docs/                 # Detailed documentation
โ”œโ”€โ”€ config/              # Configuration files
โ””โ”€โ”€ requirements.txt     # Python dependencies

๐ŸŽฏ Features

  • 100% Local: Everything runs on your machine
  • Zero Cost: No API fees, ever
  • Private: Your data never leaves your computer
  • Fast: Optimized for local inference
  • Simple: Clean API, easy to understand
  • Extensible: Modular design for customization

๐Ÿ”ง Configuration

Create a .env file (optional):

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=tinyllama:latest
OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest
CHUNK_SIZE=500
CHUNK_OVERLAP=50

๐Ÿ“Š Model Options

RAM Model Quality Speed
4GB tinyllama Good Fast
8GB mistral Better Good
16GB llama2:13b Great Moderate
32GB+ mixtral Best Slower

๐Ÿงช Testing

Run the test suite:

python tests/test_local_rag.py

๐Ÿ“š Documentation

๐Ÿ› ๏ธ Development

Using UV (Recommended)

UV is a fast Python package manager:

# Install uv
.\scripts\windows\setup_uv.ps1  # Windows
./scripts/unix/setup_uv.sh      # Unix

# Run with uv
uv run python src/cli.py

Code Quality

# Format code
black src/

# Type checking
mypy src/

# Linting
pylint src/

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

๐Ÿ“„ License

MIT License - Use freely in your projects!

๐Ÿ†˜ Troubleshooting

Ollama not running?

# Start Ollama
ollama serve

# Check if running
curl http://localhost:11434/api/tags

Model not found?

# List models
ollama list

# Pull missing model
ollama pull tinyllama

Out of memory?

  • Use smaller models (tinyllama instead of llama2)
  • Reduce chunk_size in config
  • Close other applications
  • Privacy First: Your documents, your queries, your hardware
  • No Vendor Lock-in: Not dependent on any cloud service
  • Cost Effective: One-time setup, unlimited usage
  • Fast Iteration: No network latency
  • Full Control: Customize everything

Built with โค๏ธ for the local-first community