GitHub - coder/balatrobench: Benchmark LLMs' strategic performance in Balatro 📊

Benchmark LLMs playing Balatro

BalatroBench is a benchmark analysis tool and leaderboard for BalatroLLM runs. It processes game data and generates interactive leaderboards comparing LLM models and strategies playing Balatro.

Note

You can download all the data for runs and benchmarks from the kaggle.

🚀 Related Projects

BalatroBot: API for developing Balatro bots
BalatroLLM: Play Balatro with LLMs
BalatroBench: Benchmark LLMs playing Balatro

📚 Documentation

Important

This is the documentation for analyzing runs artifacts produced by BalatroLLM. This project parses the data and displays it as a website.

Requirements

uv - Python package manager (installation steps below)
Node.js - JavaScript runtime (includes npm) required just for Playwright tests

Installation

Follow these steps to set up BalatroBench:

Install uv

Install the uv Python package manager:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

See the uv installation docs for more options.

Clone the repository

git clone https://github.com/coder/balatrobench.git
cd balatrobench

Configure environment variables

Copy the example environment file and fill in your values:

Edit .envrc and set the following variables (required for uploading benchmarks to CDN):
- BUNNY_BASE_URL - BunnyCDN base URL
- BUNNY_STORAGE_ZONE - Storage zone name
- BUNNY_API_KEY - API key for authentication
Install dependencies

This runs uv sync for Python packages and npm install for Playwright tests.
Activate the environment

Alternatively, use direnv to automatically load the environment:
```
# Install direnv, then allow the directory
direnv allow
```
Install browser binaries (first time only)
```
npx playwright install chromium
```

Generating Benchmarks

Generate benchmark data from BalatroLLM runs:

# Analyze runs from a specific directory
balatrobench --input-dir /path/to/runs/v1.0.0

# Custom output directory
balatrobench --input-dir /path/to/runs/v1.0.0 --output-dir /path/to/output

# Enable WebP conversion for screenshots
balatrobench --input-dir /path/to/runs/v1.0.0 --webp

Starting the Website

Serve the site locally:

This will start a local server at http://localhost:8000 and automatically open it in your browser.

The environment is automatically detected (localhost = development, otherwise = production). To override, use the query parameter: ?env=development or ?env=production.

Running Tests

End-to-end tests use Playwright and balatrobench tests:

Note

Although playwright.config.js includes webServer configuration, the server may not auto-start reliably. If tests fail to connect, manually start the server first:

make serve  # In a separate terminal
make test   # Run tests