RAG-Local/python_example/README.md at main

Python Local RAG System

100% Local • Zero Cost • Complete Privacy

A pure Python implementation of Retrieval-Augmented Generation (RAG) that runs entirely on your machine. No API keys, no cloud services, no data leaving your computer.

🚀 Quick Start

Prerequisites

Python 3.9+
4GB+ RAM (8GB recommended)
Windows, macOS, or Linux

One-Line Setup

Windows (PowerShell):

.\scripts\windows\setup_complete.ps1

macOS/Linux:

./scripts/unix/setup_complete.sh

This will:

Install Ollama (local LLM server)
Download TinyLlama model (1.1B parameters, runs on any machine)
Download Nomic embedding model
Set up Python environment
Install all dependencies

Manual Setup

Install Ollama:
- Download from ollama.ai
- Or use scripts: .\scripts\windows\install_ollama_windows.ps1

Pull Models:

ollama pull tinyllama
ollama pull nomic-embed-text

Install Dependencies:
```
pip install -r requirements.txt
```

💻 Usage

Interactive CLI

Python Script

from src.rag_pipeline_local import LocalRAGPipeline

# Initialize
rag = LocalRAGPipeline()

# Add documents
rag.add_documents([
    "Python is a versatile programming language.",
    "RAG combines retrieval and generation."
])

# Query
response = rag.query("What is Python?")
print(response['answer'])

Jupyter Notebook

jupyter notebook rag_example.ipynb

📁 Project Structure

python_example/
├── src/                    # Core RAG implementation
│   ├── rag_pipeline_local.py  # Main pipeline
│   ├── llm_local.py           # Ollama LLM wrapper
│   ├── embeddings_local.py    # Local embeddings
│   ├── vector_store_lancedb.py # Vector storage
│   ├── chunking.py            # Text chunking
│   └── cli.py                 # Interactive CLI
├── scripts/               # Setup & utility scripts
│   ├── windows/          # Windows scripts
│   └── unix/            # macOS/Linux scripts
├── tests/                # Test suite
├── docs/                 # Detailed documentation
├── config/              # Configuration files
└── requirements.txt     # Python dependencies

🎯 Features

100% Local: Everything runs on your machine
Zero Cost: No API fees, ever
Private: Your data never leaves your computer
Fast: Optimized for local inference
Simple: Clean API, easy to understand
Extensible: Modular design for customization

🔧 Configuration

Create a .env file (optional):

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=tinyllama:latest
OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest
CHUNK_SIZE=500
CHUNK_OVERLAP=50

📊 Model Options

RAM	Model	Quality	Speed
4GB	tinyllama	Good	Fast
8GB	mistral	Better	Good
16GB	llama2:13b	Great	Moderate
32GB+	mixtral	Best	Slower

🧪 Testing

Run the test suite:

python tests/test_local_rag.py

📚 Documentation

Architecture - System design and components
Setup Guide - Detailed installation instructions
Learning RAG - Understanding RAG concepts
Migration Guide - Upgrading from older versions

🛠️ Development

Using UV (Recommended)

UV is a fast Python package manager:

# Install uv
.\scripts\windows\setup_uv.ps1  # Windows
./scripts/unix/setup_uv.sh      # Unix

# Run with uv
uv run python src/cli.py

Code Quality

# Format code
black src/

# Type checking
mypy src/

# Linting
pylint src/

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License - Use freely in your projects!

🆘 Troubleshooting

Ollama not running?

# Start Ollama
ollama serve

# Check if running
curl http://localhost:11434/api/tags

Model not found?

# List models
ollama list

# Pull missing model
ollama pull tinyllama

Out of memory?

Use smaller models (tinyllama instead of llama2)
Reduce chunk_size in config
Close other applications

Privacy First: Your documents, your queries, your hardware
No Vendor Lock-in: Not dependent on any cloud service
Cost Effective: One-time setup, unlimited usage
Fast Iteration: No network latency
Full Control: Customize everything

Built with ❤️ for the local-first community