A command-line interface for interacting with the SWE-bench API. Use this tool to submit predictions, manage runs, and retrieve evaluation reports.
Read the full documentation here. For submission guidelines, see here.
Installation
Authentication
Before using the CLI, you'll need to get an API key:
- Generate an API key:
sb-cli gen-api-key your.email@example.com
- Set your API key as an environment variable - and store it somewhere safe!
export SWEBENCH_API_KEY=your_api_key # or add export SWEBENCH_API_KEY=your_api_key to your .*rc file
- You'll receive an email with a verification code. Verify your API key:
sb-cli verify-api-key YOUR_VERIFICATION_CODE
Subsets and Splits
SWE-bench has different subsets and splits available:
Subsets
swe-bench-m: The SWE-bench Multimodal datasetswe-bench_verified: 500 verified problems from SWE-bench Learn moreswe-bench_lite: A subset of the original SWE-bench for testing
Splits
dev: Development/validation splittest: Test split (currently only available forswe-bench_liteandswe-bench_verified)
You'll need to specify both a subset and split for most commands.
Usage
Submit Predictions
Submit your model's predictions to SWE-bench:
sb-cli submit swe-bench-m test \
--predictions_path predictions.json \
--run_id my_run_idOptions:
--run_id: ID of the run to submit predictions for (optional, defaults to the name of the parent directory of the predictions file)--instance_ids: Comma-separated list of specific instance IDs to submit (optional)--output_dir: Directory to save report files (default: sb-cli-reports)--overwrite: Overwrite existing report (default: 0)--gen_report: Generate a report after evaluation is complete (default: 1)
Get Report
Retrieve evaluation results for a specific run:
sb-cli get-report swe-bench-m dev my_run_id -o ./reports
List Runs
View all your existing run IDs for a specific subset and split:
sb-cli list-runs swe-bench-m dev
Predictions File Format
Your predictions file should be a JSON file in one of these formats:
{
"instance_id_1": {
"model_patch": "...",
"model_name_or_path": "..."
},
"instance_id_2": {
"model_patch": "...",
"model_name_or_path": "..."
}
}Or as a list:
[
{
"instance_id": "instance_id_1",
"model_patch": "...",
"model_name_or_path": "..."
},
{
"instance_id": "instance_id_2",
"model_patch": "...",
"model_name_or_path": "..."
}
]Submitting to the Multimodal Leaderboard
To submit your system to the SWE-bench Multimodal leaderboard:
- Submit your predictions for the
swe-bench-m/testsplit using the CLI - Fork the experiments repository
- Add your submission files under
experiments/multimodal/YOUR_MODEL_NAME/ - Create a PR with your submission
See the detailed guide in our submission documentation.
Note: Check your test split quota using sb-cli quota swe-bench-m test before submitting.
