Python Local RAG System
100% Local โข Zero Cost โข Complete Privacy
A pure Python implementation of Retrieval-Augmented Generation (RAG) that runs entirely on your machine. No API keys, no cloud services, no data leaving your computer.
๐ Quick Start
Prerequisites
- Python 3.9+
- 4GB+ RAM (8GB recommended)
- Windows, macOS, or Linux
One-Line Setup
Windows (PowerShell):
.\scripts\windows\setup_complete.ps1
macOS/Linux:
./scripts/unix/setup_complete.sh
This will:
- Install Ollama (local LLM server)
- Download TinyLlama model (1.1B parameters, runs on any machine)
- Download Nomic embedding model
- Set up Python environment
- Install all dependencies
Manual Setup
-
Install Ollama:
- Download from ollama.ai
- Or use scripts:
.\scripts\windows\install_ollama_windows.ps1
-
Pull Models:
ollama pull tinyllama ollama pull nomic-embed-text
-
Install Dependencies:
pip install -r requirements.txt
๐ป Usage
Interactive CLI
Python Script
from src.rag_pipeline_local import LocalRAGPipeline # Initialize rag = LocalRAGPipeline() # Add documents rag.add_documents([ "Python is a versatile programming language.", "RAG combines retrieval and generation." ]) # Query response = rag.query("What is Python?") print(response['answer'])
Jupyter Notebook
jupyter notebook rag_example.ipynb
๐ Project Structure
python_example/
โโโ src/ # Core RAG implementation
โ โโโ rag_pipeline_local.py # Main pipeline
โ โโโ llm_local.py # Ollama LLM wrapper
โ โโโ embeddings_local.py # Local embeddings
โ โโโ vector_store_lancedb.py # Vector storage
โ โโโ chunking.py # Text chunking
โ โโโ cli.py # Interactive CLI
โโโ scripts/ # Setup & utility scripts
โ โโโ windows/ # Windows scripts
โ โโโ unix/ # macOS/Linux scripts
โโโ tests/ # Test suite
โโโ docs/ # Detailed documentation
โโโ config/ # Configuration files
โโโ requirements.txt # Python dependencies
๐ฏ Features
- 100% Local: Everything runs on your machine
- Zero Cost: No API fees, ever
- Private: Your data never leaves your computer
- Fast: Optimized for local inference
- Simple: Clean API, easy to understand
- Extensible: Modular design for customization
๐ง Configuration
Create a .env file (optional):
OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_LLM_MODEL=tinyllama:latest OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest CHUNK_SIZE=500 CHUNK_OVERLAP=50
๐ Model Options
| RAM | Model | Quality | Speed |
|---|---|---|---|
| 4GB | tinyllama | Good | Fast |
| 8GB | mistral | Better | Good |
| 16GB | llama2:13b | Great | Moderate |
| 32GB+ | mixtral | Best | Slower |
๐งช Testing
Run the test suite:
python tests/test_local_rag.py
๐ Documentation
- Architecture - System design and components
- Setup Guide - Detailed installation instructions
- Learning RAG - Understanding RAG concepts
- Migration Guide - Upgrading from older versions
๐ ๏ธ Development
Using UV (Recommended)
UV is a fast Python package manager:
# Install uv .\scripts\windows\setup_uv.ps1 # Windows ./scripts/unix/setup_uv.sh # Unix # Run with uv uv run python src/cli.py
Code Quality
# Format code black src/ # Type checking mypy src/ # Linting pylint src/
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
๐ License
MIT License - Use freely in your projects!
๐ Troubleshooting
Ollama not running?
# Start Ollama ollama serve # Check if running curl http://localhost:11434/api/tags
Model not found?
# List models ollama list # Pull missing model ollama pull tinyllama
Out of memory?
- Use smaller models (tinyllama instead of llama2)
- Reduce chunk_size in config
- Close other applications
- Privacy First: Your documents, your queries, your hardware
- No Vendor Lock-in: Not dependent on any cloud service
- Cost Effective: One-time setup, unlimited usage
- Fast Iteration: No network latency
- Full Control: Customize everything
Built with โค๏ธ for the local-first community