GitHub - sharpninja/graphrag: A modular graph-based Retrieval-Augmented Generation (RAG) system

πŸ‘‰ Microsoft Research Blog Post
πŸ‘‰ Read the docs
πŸ‘‰ GraphRAG Arxiv

PyPI - Version PyPI - Downloads GitHub Issues GitHub Discussions

Overview

The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs. GraphRAG is available in both Python and .NET 10 C#, with identical functionality and configuration format across both implementations.

To learn more about GraphRAG and how it can be used to enhance your LLM's ability to reason about your private data, please visit the Microsoft Research Blog Post.

Implementations

Python .NET
Runtime Python 3.10–3.12 .NET 10 (C#)
Packages 8 packages on PyPI 8 core libraries + 15 strategy plugin assemblies
Architecture Factory pattern with ABC interfaces Strategy pattern with runtime assembly discovery
Config Format settings.yaml settings.yaml (shared format)
CLI graphrag (via pip) dotnet run --project src/GraphRag
Tests pytest (unit, integration, smoke) xUnit (200 tests β€” unit + integration)
Getting Started Python Quickstart .NET Getting Started
Search UI unified-search-app/ (Streamlit) GraphRag.SearchApp (Blazor) β€” Getting Started

Quickstart

Python

To get started with the Python implementation, we recommend trying the command line quickstart.

pip install graphrag
graphrag init
# Add your documents to ./input/
graphrag index
graphrag query "What are the top themes in this story?"

.NET

For the .NET implementation, see the full .NET Getting Started Guide.

git clone https://github.com/microsoft/graphrag.git
cd graphrag/dotnet
dotnet build
dotnet run --project src/GraphRag -- init --root ./my-project
# Add your documents to ./my-project/input/
dotnet run --project src/GraphRag -- index --root ./my-project
dotnet run --project src/GraphRag -- query --root ./my-project --method local --query "What are the top themes?"

Repository Structure

graphrag/
β”œβ”€β”€ packages/               ← Python monorepo (8 packages)
β”‚   β”œβ”€β”€ graphrag/           ← Main Python package (CLI, indexing, query)
β”‚   β”œβ”€β”€ graphrag-common/    ← Shared utilities
β”‚   β”œβ”€β”€ graphrag-storage/   ← Storage backends
β”‚   β”œβ”€β”€ graphrag-cache/     ← Caching layer
β”‚   β”œβ”€β”€ graphrag-chunking/  ← Text chunking
β”‚   β”œβ”€β”€ graphrag-input/     ← Document ingestion
β”‚   β”œβ”€β”€ graphrag-llm/       ← LLM abstraction
β”‚   └── graphrag-vectors/   ← Vector stores
β”œβ”€β”€ dotnet/                 ← .NET 10 implementation
β”‚   β”œβ”€β”€ src/                ← 8 core libraries + 15 strategy plugins + SearchApp
β”‚   β”œβ”€β”€ tests/              ← Unit + integration tests
β”‚   └── docs/               ← .NET-specific documentation
β”œβ”€β”€ docs/                   ← MkDocs documentation site (Python-focused)
β”œβ”€β”€ tests/                  ← Python test suite
└── scripts/                ← Build & CI scripts

Repository Guidance

This repository presents a methodology for using knowledge graph memory structures to enhance LLM outputs. Please note that the provided code serves as a demonstration and is not an officially supported Microsoft offering.

⚠️ Warning: GraphRAG indexing can be an expensive operation, please read all of the documentation to understand the process and costs involved, and start small.

Diving Deeper

Prompt Tuning

Using GraphRAG with your data out of the box may not yield the best possible results. We strongly recommend to fine-tune your prompts following the Prompt Tuning Guide in our documentation.

Versioning

Please see the breaking changes document for notes on our approach to versioning the project.

Python: Always run graphrag init --root [path] --force between minor version bumps to ensure you have the latest config format. Run the provided migration notebook between major version bumps if you want to avoid re-indexing prior datasets. Note that this will overwrite your configuration and prompts, so backup if necessary.

.NET: The .NET implementation shares the same settings.yaml configuration format. See dotnet/docs/getting-started.md for version-specific details.

Responsible AI FAQ

See RAI_TRANSPARENCY.md

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Privacy

Microsoft Privacy Statement