An intelligent research paper analysis system that enables natural language interaction with a knowledge base of academic papers. Built with LangGraph, the system uses a multi-agent architecture to intelligently route queries to the most appropriate processing pipeline.
Features
- Intelligent Query Routing - Automatically determines whether to use RAG retrieval, database query tools, or direct LLM responses based on the question type
- PDF Ingestion Pipeline - Extracts text and metadata from research papers using PyMuPDF and LLM-powered metadata extraction
- Semantic Search - Vector-based retrieval using LlamaIndex and HuggingFace embeddings for finding relevant paper content
- Structured Data Queries - SQL-based metadata lookups for questions about authors, paper counts, and other structured data
- Citation-Aware Responses - Provides grounded answers with references to source papers
Architecture
The system uses a LangGraph-based agentic workflow with four specialized agents:
User Query
│
▼
┌─────────────────┐
│ Router Agent │ ─── Classifies query type
└────────┬────────┘
│
┌────┴────┬──────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌─────────┐
│ RAG │ │ Query │ │ Answer │
│ Agent │ │ Agent │ │ Agent │
└───┬───┘ └───┬───┘ └────┬────┘
│ │ │
└─────────┴──────────┘
│
▼
Final Response
| Route | Use Case |
|---|---|
| RAG | Content questions about paper topics, concepts, or findings |
| DATABASE | Metadata queries (paper counts, author lookups, etc.) |
| ANSWER | General knowledge questions unrelated to the paper collection |
Tech Stack
| Category | Technologies |
|---|---|
| Orchestration | LangGraph |
| LLM Framework | LangChain |
| LLM Provider | Anthropic Claude |
| RAG/Indexing | LlamaIndex |
| Embeddings | HuggingFace (BAAI/bge-small-en-v1.5) |
| Database | SQLite + SQLModel |
| PDF Processing | PyMuPDF |
| Testing | pytest + DeepEval |
Installation
Prerequisites: Python 3.13+
# Clone the repository git clone https://github.com/stephenhermes/quickmaths.git cd quickmaths # Install dependencies pip install -e . # Or with dev dependencies pip install -e ".[dev]"
Configuration
Create a .env file in the project root:
# Required PDF_DIRECTORY='/path/to/your/pdf/papers' ANTHROPIC_API_KEY='your-api-key' # Optional LLM_MODEL='claude-sonnet-4-20250514' EMBED_MODEL='BAAI/bge-small-en-v1.5' LOG_LEVEL='INFO'
Usage
1. Ingest Papers
First, ingest your PDF papers into the system:
quickmaths-ingest ${PDF_DIRECTORY} # Options: # --rebuild-index Rebuild the vector index from scratch # --rebuild-db Rebuild the metadata database # --num-files-limit Limit number of files to process
The ingestion pipeline:
- Converts PDFs to markdown text
- Extracts metadata (title, authors) using LLM
- Chunks documents with sentence-aware splitting
- Embeds and indexes chunks for semantic search
- Stores metadata in SQLite for structured queries
2. Interactive Chat
Start the interactive Q&A interface:
Example queries:
- "What papers discuss transformer architectures?" → RAG retrieval
- "How many papers are in the database?" → Database query
- "What is gradient descent?" → Direct LLM answer
Testing
# Run all tests pytest # Run with coverage pytest --cov=quickmaths # Run specific test categories pytest tests/unit pytest tests/integration
The test suite includes DeepEval integration for LLM evaluation metrics.