Web search and content extraction for AI models via Model Context Protocol (MCP)
Quick Start
Docker (Recommended)
# Run with Docker (no setup required) docker run -p 8000:8000 tmfrisinger/webcat:latest # With Serper API key for premium search docker run -p 8000:8000 -e SERPER_API_KEY=your_key tmfrisinger/webcat:latest # With authentication enabled docker run -p 8000:8000 -e WEBCAT_API_KEY=your_token tmfrisinger/webcat:latest
Supports: linux/amd64, linux/arm64 (Intel/AMD, Apple Silicon, AWS Graviton)
Local Development
cd docker python -m pip install -e ".[dev]" # Start MCP server with auto-reload make dev # Or run directly python mcp_server.py
What is WebCat?
WebCat is an MCP (Model Context Protocol) server that provides AI models with:
- 🔍 Web Search - Serper API (premium) or DuckDuckGo (free fallback)
- 📄 Content Extraction - Serper scrape API (premium) or Trafilatura (free fallback)
- 🌐 Modern HTTP Transport - Streamable HTTP with JSON-RPC 2.0
- 🐳 Multi-Platform Docker - Works on Intel, ARM, and Apple Silicon
- 🎯 Composite Tool - Single SERPER_API_KEY enables both search + scraping
Built with FastMCP, Serper.dev, and Trafilatura for seamless AI integration.
Features
- ✅ Optional Authentication - Bearer token auth when needed, or run without (v2.3.1)
- ✅ Composite Search Tool - Single Serper API key enables both search + scraping
- ✅ Automatic Fallback - Search: Serper → DuckDuckGo | Scraping: Serper → Trafilatura
- ✅ Premium Scraping - Serper's optimized infrastructure for fast, clean content extraction
- ✅ Smart Content Extraction - Returns markdown with preserved document structure
- ✅ MCP Compliant - Works with Claude Desktop, LiteLLM, and other MCP clients
- ✅ Parallel Processing - Fast concurrent scraping
- ✅ Multi-Platform Docker - Linux (amd64/arm64) support
Installation & Usage
Docker Deployment
# Quick start - no configuration needed docker run -p 8000:8000 tmfrisinger/webcat:latest # With environment variables docker run -p 8000:8000 \ -e SERPER_API_KEY=your_key \ -e WEBCAT_API_KEY=your_token \ tmfrisinger/webcat:latest # Using docker-compose cd docker docker-compose up
Local Development
cd docker python -m pip install -e ".[dev]" # Configure environment (optional) echo "SERPER_API_KEY=your_key" > .env # Development mode with auto-reload make dev # Start MCP server with auto-reload # Production mode make mcp # Start MCP server
Available Endpoints
| Endpoint | Description |
|---|---|
http://localhost:8000/health |
💗 Health check |
http://localhost:8000/status |
📊 Server status |
http://localhost:8000/mcp |
🛠️ MCP protocol endpoint (Streamable HTTP with JSON-RPC 2.0) |
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
SERPER_API_KEY |
(none) | Serper API key for premium search (optional, falls back to DuckDuckGo if not set) |
PERPLEXITY_API_KEY |
(none) | Perplexity API key for deep research tool (optional, get at https://www.perplexity.ai/settings/api) |
WEBCAT_API_KEY |
(none) | Bearer token for authentication (optional, if set all requests must include Authorization: Bearer <token>) |
PORT |
8000 |
Server port |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
LOG_DIR |
/tmp |
Log file directory |
MAX_CONTENT_LENGTH |
1000000 |
Maximum characters to return per scraped article |
Get API Keys
Serper API (for web search + scraping):
- Visit serper.dev
- Sign up for free tier (2,500 searches/month + scraping)
- Copy your API key
- Add to
.envfile:SERPER_API_KEY=your_key - Note: One API key enables both search AND content scraping!
Perplexity API (for deep research):
- Visit perplexity.ai/settings/api
- Sign up and get your API key
- Copy your API key
- Add to
.envfile:PERPLEXITY_API_KEY=your_key
Enable Authentication (Optional)
To require bearer token authentication for all MCP tool calls:
- Generate a secure random token:
openssl rand -hex 32 - Add to
.envfile:WEBCAT_API_KEY=your_token - Include in all requests:
Authorization: Bearer your_token
Note: If WEBCAT_API_KEY is not set, no authentication is required.
MCP Tools
WebCat exposes these tools via MCP:
| Tool | Description | Parameters |
|---|---|---|
search |
Search web and extract content | query: str, max_results: int |
scrape_url |
Scrape specific URL | url: str |
health_check |
Check server health | (none) |
get_server_info |
Get server capabilities | (none) |
Architecture
MCP Client (Claude, LiteLLM)
↓
FastMCP Server (Streamable HTTP with JSON-RPC 2.0)
↓
Authentication (optional bearer token)
↓
Search Decision
├─ Serper API (premium) → Serper Scrape API (premium)
└─ DuckDuckGo (free) → Trafilatura (free)
↓
Markdown Response
Tech Stack:
- FastMCP - MCP protocol implementation with modern HTTP transport
- JSON-RPC 2.0 - Standard protocol for client-server communication
- Serper API - Google-powered search + optimized web scraping
- Trafilatura - Fallback content extraction (removes navigation/ads)
- DuckDuckGo - Free search fallback
Testing
cd docker # Run all unit tests make test # OR python -m pytest tests/unit -v # With coverage report make test-coverage # OR python -m pytest tests/unit --cov=. --cov-report=term --cov-report=html # CI-safe tests (no external dependencies) python -m pytest -v -m "not integration" # Run specific test file python -m pytest tests/unit/services/test_content_scraper.py -v
Current test coverage: 70%+ across all modules (enforced in CI)
Development
# First-time setup make setup-dev # Install all dependencies + pre-commit hooks # Development workflow make dev # Start server with auto-reload make format # Auto-format code (Black + isort) make lint # Check code quality (flake8) make test # Run unit tests # Before committing make ci-fast # Quick validation (~30 seconds) # OR make ci # Full validation with security checks (~2-3 minutes) # Code quality tools make format-check # Check formatting without changes make security # Run bandit security scanner make audit # Check dependency vulnerabilities
Pre-commit Hooks:
Hooks run automatically on git commit to ensure code quality. Install with make setup-dev.
Project Structure
docker/
├── mcp_server.py # Main MCP server (FastMCP)
├── cli.py # CLI interface for server modes
├── health.py # Health check endpoint
├── api_tools.py # API tooling utilities
├── clients/ # External API clients
│ ├── serper_client.py # Serper API (search + scrape)
│ └── duckduckgo_client.py # DuckDuckGo fallback
├── services/ # Core business logic
│ ├── search_service.py # Search orchestration
│ └── content_scraper.py # Serper scrape → Trafilatura fallback
├── tools/ # MCP tool implementations
│ └── search_tool.py # Search tool with auth
├── models/ # Pydantic data models
│ ├── domain/ # Domain entities (SearchResult, etc.)
│ └── responses/ # API response models
├── utils/ # Shared utilities
│ └── auth.py # Bearer token authentication
├── endpoints/ # FastAPI endpoints
├── tests/ # Comprehensive test suite
│ ├── unit/ # Unit tests (mocked dependencies)
│ └── integration/ # Integration tests (external deps)
└── pyproject.toml # Project config + dependencies
Search Quality Comparison
| Feature | Serper API | DuckDuckGo |
|---|---|---|
| Cost | Paid (free tier available) | Free |
| Quality | ⭐⭐⭐⭐⭐ Excellent | ⭐⭐⭐⭐ Good |
| Coverage | Comprehensive (Google-powered) | Standard |
| Speed | Fast | Fast |
| Rate Limits | 2,500/month (free tier) | None |
Docker Multi-Platform Support
WebCat supports multiple architectures for broad deployment compatibility:
# Build locally for multiple platforms cd docker ./build.sh # Builds for linux/amd64 and linux/arm64 # Manual multi-platform build and push docker buildx build --platform linux/amd64,linux/arm64 \ -t tmfrisinger/webcat:2.3.2 \ -t tmfrisinger/webcat:latest \ -f Dockerfile --push . # Verify multi-platform support docker buildx imagetools inspect tmfrisinger/webcat:latest
Automated Releases: Push a version tag to trigger automated multi-platform builds via GitHub Actions:
git tag v2.3.2 git push origin v2.3.2
Limitations
- Text-focused: Optimized for article content, not multimedia
- No JavaScript: Cannot scrape dynamic JS-rendered content (uses static HTML)
- PDF support: Detection only, not full extraction
- Python 3.11 required: Not compatible with 3.10 or 3.12
- External API limits: Subject to Serper API rate limits (2,500/month free tier)
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure
make cipasses - Submit a Pull Request
See CLAUDE.md for development guidelines and architecture standards.
License
MIT License - see LICENSE file for details.
Links
- GitHub: github.com/Kode-Rex/webcat
- MCP Spec: modelcontextprotocol.io
- Serper API: serper.dev
Version 2.3.2 | Built with FastMCP, FastAPI, Readability, and html2text