Supercharge Your LLM Application Evaluations ๐
Objective metrics, intelligent test generation, and data-driven insights for LLM apps
Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications. Say goodbye to time-consuming, subjective assessments and hello to data-driven, efficient evaluation workflows. Don't have a test dataset ready? We also do production-aligned test set generation.
Key Features
- ๐ฏ Objective Metrics: Evaluate your LLM applications with precision using both LLM-based and traditional metrics.
- ๐งช Test Data Generation: Automatically create comprehensive test datasets covering a wide range of scenarios.
- ๐ Seamless Integrations: Works flawlessly with popular LLM frameworks like LangChain and major observability tools.
- ๐ Build feedback loops: Leverage production data to continually improve your LLM applications.
๐ก๏ธ Installation
Pypi:
Alternatively, from source:
pip install git+https://github.com/vibrantlabsai/ragas
๐ฅ Quickstart
Clone a Complete Example Project
The fastest way to get started is to use the ragas quickstart command:
# List available templates ragas quickstart # Create a RAG evaluation project ragas quickstart rag_eval # Specify where you want to create it. ragas quickstart rag_eval -o ./my-project
Available templates:
rag_eval- Evaluate RAG systems
Coming Soon:
agent_evals- Evaluate AI agentsbenchmark_llm- Benchmark and compare LLMsprompt_evals- Evaluate prompt variationsworkflow_eval- Evaluate complex workflows
Evaluate your LLM App
ragas comes with pre-built metrics for common evaluation tasks. For example, Aspect Critique evaluates any aspect of your output using DiscreteMetric:
import asyncio from openai import AsyncOpenAI from ragas.metrics import DiscreteMetric from ragas.llms import llm_factory # Setup your LLM client = AsyncOpenAI() llm = llm_factory("gpt-4o", client=client) # Create a custom aspect evaluator metric = DiscreteMetric( name="summary_accuracy", allowed_values=["accurate", "inaccurate"], prompt="""Evaluate if the summary is accurate and captures key information. Response: {response} Answer with only 'accurate' or 'inaccurate'.""" ) # Score your application's output async def main(): score = await metric.ascore( llm=llm, response="The summary of the text is..." ) print(f"Score: {score.value}") # 'accurate' or 'inaccurate' print(f"Reason: {score.reason}") if __name__ == "__main__": asyncio.run(main())
Note: Make sure your
OPENAI_API_KEYenvironment variable is set.
Find the complete Quickstart Guide
Want help in improving your AI application using evals?
In the past 2 years, we have seen and helped improve many AI applications using evals. If you want help with improving and scaling up your AI application using evals.
๐ Book a slot or drop us a line: founders@vibrantlabs.com.
๐ซ Community
If you want to get more involved with Ragas, check out our discord server. It's a fun community where we geek out about LLM, Retrieval, Production issues, and more.
Contributors
+----------------------------------------------------------------------------+ | +----------------------------------------------------------------+ | | | Developers: Those who built with `ragas`. | | | | (You have `import ragas` somewhere in your project) | | | | +----------------------------------------------------+ | | | | | Contributors: Those who make `ragas` better. | | | | | | (You make PR to this repo) | | | | | +----------------------------------------------------+ | | | +----------------------------------------------------------------+ | +----------------------------------------------------------------------------+
We welcome contributions from the community! Whether it's bug fixes, feature additions, or documentation improvements, your input is valuable.
- Fork the repository
- Create your feature branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request
๐ Open Analytics
At Ragas, we believe in transparency. We collect minimal, anonymized usage data to improve our product and guide our development efforts.
โ No personal or company-identifying information
โ Open-source data collection code
โ Publicly available aggregated data
To opt-out, set the RAGAS_DO_NOT_TRACK environment variable to true.
Cite Us
@misc{ragas2024,
author = {VibrantLabs},
title = {Ragas: Supercharge Your LLM Application Evaluations},
year = {2024},
howpublished = {\url{https://github.com/vibrantlabsai/ragas}},
}
