Open Browser
AI-powered autonomous web browsing framework for TypeScript.
Give an AI agent a browser. It clicks, types, navigates, and extracts data — autonomously completing tasks on any website. Built on Playwright with first-class support for OpenAI, Anthropic, and Google models.
Production-ready since v1.0. Contributions welcome.
Why Open Browser?
- Autonomous agents: Describe a task in natural language, and an AI agent navigates the web to complete it — clicking, typing, scrolling, and extracting data without manual scripting
- Multi-model support: Works with OpenAI, Anthropic, and Google out of the box via the Vercel AI SDK — swap models with a single flag
- Interactive REPL: Drop into a live browser session and issue commands interactively — great for debugging, prototyping, and exploration
- Sandboxed execution: Run agents in resource-limited environments with CPU/memory monitoring, timeouts, and domain restrictions
- Production-ready: Stall detection, cost tracking, session management, replay recording, and comprehensive error handling
- Open source: MIT licensed, fully extensible, bring your own API keys
Quick Start
# Install dependencies bun install # Set up your API keys cp .env.example .env # Edit .env with your API keys # Run an agent bun run open-browser run "Find the top story on Hacker News and summarize it" # Or open a browser interactively bun run open-browser interactive
Architecture
Open Browser is a monorepo with three packages:
| Package | Description |
|---|---|
open-browser |
Core library — agent logic, browser control, DOM analysis, LLM integration |
@open-browser/cli |
Command-line interface for running agents and browser commands |
@open-browser/sandbox |
Sandboxed execution with resource limits and monitoring |
CLI Commands
Run an AI Agent
open-browser run <task> [options]
Describe what you want done. The agent figures out the rest.
# Search and extract information open-browser run "Find the price of the MacBook Pro on apple.com" # Fill out forms open-browser run "Sign up for the newsletter on example.com with test@email.com" # Multi-step workflows open-browser run "Go to GitHub, find the open-browser repo, and star it"
| Option | Description |
|---|---|
-m, --model <model> |
Model to use (default: gpt-4o) |
-p, --provider <provider> |
Provider: openai, anthropic, google |
--headless / --no-headless |
Show or hide the browser window |
--max-steps <n> |
Max agent steps (default: 25) |
-v, --verbose |
Show detailed step info |
--no-cost |
Hide cost tracking |
Browser Commands
open-browser open <url> # Open a URL open-browser click <selector> # Click an element open-browser type <selector> <text> # Type into an input open-browser screenshot [output] # Capture a screenshot open-browser eval <expression> # Run JavaScript on the page open-browser extract <goal> # Extract content as markdown open-browser state # Show current URL, title, and tabs open-browser sessions # List active browser sessions
Interactive REPL
Drop into a live browser> prompt with full control:
browser> open https://news.ycombinator.com
browser> extract "top 5 stories with titles and points"
browser> click .morelink
browser> screenshot front-page.png
browser> help
Using as a Library
import { Agent, createViewport, createModel } from 'open-browser' const viewport = await createViewport({ headless: true }) const model = createModel('openai', 'gpt-4o') const agent = new Agent({ viewport, model, task: 'Go to example.com and extract the main heading', settings: { stepLimit: 50, enableScreenshots: true, }, }) const result = await agent.run() console.log(result)
Sandboxed Execution
Run agents with resource limits and monitoring:
import { Sandbox } from '@open-browser/sandbox' const sandbox = new Sandbox({ timeout: 300_000, // 5 minute timeout maxMemoryMB: 512, // Memory limit allowedDomains: ['example.com'], stepLimit: 100, captureOutput: true, }) const result = await sandbox.run({ task: 'Complete the checkout form', model: languageModel, }) console.log(result.metrics) // steps, URLs visited, CPU time
Configuration
Environment Variables
# LLM Provider Keys (at least one required) OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... GOOGLE_GENERATIVE_AI_API_KEY=... # Browser BROWSER_HEADLESS=true BROWSER_DISABLE_SECURITY=false # Recording & Debugging OPEN_BROWSER_TRACE_PATH=./traces OPEN_BROWSER_SAVE_RECORDING_PATH=./recordings
Agent Configuration
| Setting | Default | Description |
|---|---|---|
stepLimit |
100 |
Maximum agent iterations |
commandsPerStep |
10 |
Actions per agent step |
failureThreshold |
5 |
Consecutive failures before stopping |
enableScreenshots |
true |
Include page screenshots in agent context |
contextWindowSize |
128000 |
Token budget for conversation |
allowedUrls |
[] |
Restrict navigation to specific URLs |
blockedUrls |
[] |
Block navigation to specific URLs |
Viewport Configuration
| Setting | Default | Description |
|---|---|---|
headless |
true |
Run browser without visible window |
width / height |
1280 / 1100 |
Browser window dimensions |
relaxedSecurity |
false |
Disable browser security features |
proxy |
— | Proxy server configuration |
cookieFile |
— | Path to cookie file for persistent sessions |
How It Works
┌─────────────┐
"Book a flight" │ │
───────────────► │ Agent │ ◄── LLM (OpenAI / Anthropic / Google)
│ │
└──────┬──────┘
│
┌──────▼──────┐
│ Commands │ click, type, scroll, extract, navigate...
└──────┬──────┘
│
┌──────▼──────┐
│ Viewport │ Playwright browser instance
└──────┬──────┘
│
┌──────▼──────┐
│ DOM / Page │ Snapshot, interactive elements, content
└─────────────┘
- You describe a task in natural language
- The Agent sends the current page state + task to an LLM
- The LLM decides what commands to execute (click, type, navigate, extract...)
- Commands execute against the Viewport (Playwright browser)
- The agent observes the result, detects stalls, and loops until the task is complete
Model Support
| Provider | Example Models | Flag |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, o1 |
-p openai |
| Anthropic | claude-sonnet-4-5-20250929, claude-opus-4-6 |
-p anthropic |
gemini-2.0-flash, gemini-2.5-pro |
-p google |
Project Structure
packages/
├── core/ # Core library (open-browser)
│ └── src/
│ ├── agent/ # Agent logic, conversation, stall detection
│ ├── commands/ # Action schemas and executor (25+ commands)
│ ├── viewport/ # Browser control, events, guards
│ ├── page/ # DOM analysis, content extraction
│ ├── model/ # LLM adapter and message formatting
│ ├── metering/ # Cost tracking
│ ├── bridge/ # IPC server/client
│ └── config/ # Configuration types
├── cli/ # CLI (@open-browser/cli)
│ └── src/
│ ├── commands/ # CLI command implementations
│ └── index.ts # Entry point
└── sandbox/ # Sandbox (@open-browser/sandbox)
└── src/
└── sandbox.ts # Resource-limited execution
Development
# Install dependencies bun install # Type check bun run build # Run tests bun run test # Lint bun run lint # Format bun run format
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
