BalatroBench is a benchmark analysis tool and leaderboard for BalatroLLM runs. It processes game data and generates interactive leaderboards comparing LLM models and strategies playing Balatro.
Note
You can download all the data for runs and benchmarks from the kaggle.
🚀 Related Projects
- BalatroBot: API for developing Balatro bots
- BalatroLLM: Play Balatro with LLMs
- BalatroBench: Benchmark LLMs playing Balatro
📚 Documentation
Important
This is the documentation for analyzing runs artifacts produced by BalatroLLM. This project parses the data and displays it as a website.
Requirements
- uv - Python package manager (installation steps below)
- Node.js - JavaScript runtime (includes npm) required just for Playwright tests
Installation
Follow these steps to set up BalatroBench:
-
Install uv
Install the uv Python package manager:
# macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
See the uv installation docs for more options.
-
Clone the repository
git clone https://github.com/coder/balatrobench.git cd balatrobench -
Configure environment variables
Copy the example environment file and fill in your values:
Edit
.envrcand set the following variables (required for uploading benchmarks to CDN):BUNNY_BASE_URL- BunnyCDN base URLBUNNY_STORAGE_ZONE- Storage zone nameBUNNY_API_KEY- API key for authentication
-
Install dependencies
This runs
uv syncfor Python packages andnpm installfor Playwright tests. -
Activate the environment
Alternatively, use direnv to automatically load the environment:
# Install direnv, then allow the directory direnv allow -
Install browser binaries (first time only)
npx playwright install chromium
Generating Benchmarks
Generate benchmark data from BalatroLLM runs:
# Analyze runs from a specific directory balatrobench --input-dir /path/to/runs/v1.0.0 # Custom output directory balatrobench --input-dir /path/to/runs/v1.0.0 --output-dir /path/to/output # Enable WebP conversion for screenshots balatrobench --input-dir /path/to/runs/v1.0.0 --webp
Starting the Website
Serve the site locally:
This will start a local server at http://localhost:8000 and automatically open it in your browser.
The environment is automatically detected (localhost = development, otherwise = production).
To override, use the query parameter: ?env=development or ?env=production.
Running Tests
End-to-end tests use Playwright and balatrobench tests:
Note
Although playwright.config.js includes webServer configuration, the server may not auto-start reliably. If tests fail to connect, manually start the server first:
make serve # In a separate terminal make test # Run tests