GitHub - Fluxloop-AI/fluxloop: Open-source toolkit for running reproducible, offline-first simulations of AI agents against dynamic scenarios

FluxLoop OSS

Agentic Testing for AI Agents

"Hey, test my agent for order cancellation with angry customers"
→ FluxLoop handles the rest: setup, CLI install, synthesis, execution, and analysis.

🤖 Agent-First Workflow

Your coding agent (Claude Code) orchestrates the entire testing flow. Just describe what you want to test—FluxLoop does the heavy lifting.

🎯 Simulate at Scale

Run thousands of realistic multi-turn scenarios in parallel. Find edge cases before production.

📊 Align to Your Standards

Capture your implicit decision criteria. Turn intuition into automated evaluation.

Getting Started

⭐ Claude Code Plugin (Recommended)

Install the plugin, then just talk.

/plugin install Fluxloop-AI/fluxloop-claude-plugin

That's it. Now say:

"test my agent for refund scenarios"

The Agent Test Skill handles everything:

✅ Installs FluxLoop CLI (if needed)
✅ Logs you in
✅ Creates project/scenario
✅ Synthesizes test inputs
✅ Runs simulations
✅ Analyzes results and suggests fixes

No commands to memorize. No manual setup. Just ask.

Example Conversation

User: "Test my chatbot for refund scenarios with frustrated customers"

Agent: Let me set up FluxLoop and run tests...

       ✓ FluxLoop CLI installed
       ✓ Logged in
       ✓ Project created
       ✓ 10 test inputs synthesized (40% hard cases)
       ✓ Running simulation...
       
       📊 Results: 8/10 passed (80%)
       ⚠️ Failed on edge case: customer requesting partial refund
       
       💡 Suggested fix: Add handling for partial refund requests
       
       Would you like me to analyze the failures in detail?

📖 Documentation: docs.fluxloop.ai/claude-code

📦 Packages

1. Claude Code Plugin ⭐

The primary way to use FluxLoop. Your coding agent orchestrates the entire testing workflow through natural conversation.

Feature	Description
Agent Test Skill	Auto-activates on "test my agent", handles everything
Zero Config	Skill installs CLI, logs in, creates projects automatically
Context-Aware	Knows your setup state, guides you through missing steps

📖 Location: packages/fluxloop-plugin/
📖 Docs: docs.fluxloop.ai/claude-code

2. CLI

For power users and CI/CD pipelines. Direct command-line control when you need it.

pip install fluxloop-cli
fluxloop test --scenario my-test

📖 Docs: docs.fluxloop.ai/cli
📦 PyPI: fluxloop-cli

3. SDK (Python 3.11+)

Core instrumentation library. Add @fluxloop.agent() decorator to trace agent execution.

import fluxloop

@fluxloop.agent()
def my_agent(input: str) -> str:
    # Your agent logic
    return response

📖 Docs: docs.fluxloop.ai/sdk
📦 PyPI: fluxloop

Key Features

🤖 Agentic Testing with Claude Code

Just talk naturally:

"Test my order-bot for cancellation scenarios"
"Generate edge cases for payment failures"
"Why did the last test fail?"

The skill understands context and adapts to your state.

🎯 Simple Instrumentation

Works with any Python agent framework:

@fluxloop.agent()
def my_agent(input: str) -> str:
    # LangChain, LlamaIndex, custom—anything works
    return response

📊 Evaluation-First Testing

Define criteria, run reproducible experiments, get actionable insights.

🧪 Offline-First Simulation

Run experiments locally with full control. No cloud dependency for testing.

☁️ Seamless Web Integration

FluxLoop combines local execution with cloud intelligence for a powerful testing workflow.

1. Cloud-Powered Synthesis

When you say "generate edge cases", FluxLoop Web synthesizes realistic, diverse test data using advanced LLMs. This data is instantly synced to your local environment for testing.

2. Deep Evaluation & Analysis

Test results are automatically uploaded to alpha.app.fluxloop.ai for deep inspection:

🕵️ Trace Analysis: Step-by-step debugging of agent conversations
📈 Performance Metrics: Success rates, latency, token usage trends
⚖️ Comparison: Side-by-side view of how recent changes affected behavior

3. The Perfect Loop

You: "Test my agent" (Claude Code)
Web: Generates test scenarios (Cloud)
CLI: Runs tests locally (Local)
Web: Analyzes results (Cloud)
You: Review summary in IDE & detailed report on Web

What You Can Do

Capability	How
🤖 Conversational Testing	"test my agent with angry customers"
🎯 Instrument Agents	`@fluxloop.agent()` decorator
📝 Synthesize Inputs	Skill generates realistic test data
🧪 Run Simulations	Batch experiments with parallel execution
💬 Multi-Turn Conversations	Auto-extend into dialogues
📊 Analyze Results	Get insights and fix suggestions

Links

Resource	URL
FluxLoop Web	alpha.app.fluxloop.ai
Documentation	docs.fluxloop.ai
Claude Code Plugin	docs.fluxloop.ai/claude-code
CLI Docs	docs.fluxloop.ai/cli
SDK Docs	docs.fluxloop.ai/sdk

🤝 Why Contribute?

We're building the future of AI agent testing—where your coding agent tests your AI agents.

Improve agentic workflows: Make the Claude Code skill smarter
Build framework adapters: LangChain, LlamaIndex, CrewAI
Enhance synthesis: Better intent-to-input generation
Develop evaluation methods: Novel agent performance metrics

Check out our contribution guide and open issues.

🚨 Community & Support

Issues: GitHub Issues
Docs: docs.fluxloop.ai

📄 License

FluxLoop is licensed under the Apache License 2.0.