GitHub - Kode-Rex/webcat: The repo for the Web Cat MCP Server

Web search and content extraction for AI models via Model Context Protocol (MCP)

Quick Start

Docker (Recommended)

# Run with Docker (no setup required)
docker run -p 8000:8000 tmfrisinger/webcat:latest

# With Serper API key for premium search
docker run -p 8000:8000 -e SERPER_API_KEY=your_key tmfrisinger/webcat:latest

# With authentication enabled
docker run -p 8000:8000 -e WEBCAT_API_KEY=your_token tmfrisinger/webcat:latest

Supports: linux/amd64, linux/arm64 (Intel/AMD, Apple Silicon, AWS Graviton)

Local Development

cd docker
python -m pip install -e ".[dev]"

# Start MCP server with auto-reload
make dev

# Or run directly
python mcp_server.py

What is WebCat?

WebCat is an MCP (Model Context Protocol) server that provides AI models with:

🔍 Web Search - Serper API (premium) or DuckDuckGo (free fallback)
📄 Content Extraction - Serper scrape API (premium) or Trafilatura (free fallback)
🌐 Modern HTTP Transport - Streamable HTTP with JSON-RPC 2.0
🐳 Multi-Platform Docker - Works on Intel, ARM, and Apple Silicon
🎯 Composite Tool - Single SERPER_API_KEY enables both search + scraping

Built with FastMCP, Serper.dev, and Trafilatura for seamless AI integration.

Features

✅ Optional Authentication - Bearer token auth when needed, or run without (v2.3.1)
✅ Composite Search Tool - Single Serper API key enables both search + scraping
✅ Automatic Fallback - Search: Serper → DuckDuckGo | Scraping: Serper → Trafilatura
✅ Premium Scraping - Serper's optimized infrastructure for fast, clean content extraction
✅ Smart Content Extraction - Returns markdown with preserved document structure
✅ MCP Compliant - Works with Claude Desktop, LiteLLM, and other MCP clients
✅ Parallel Processing - Fast concurrent scraping
✅ Multi-Platform Docker - Linux (amd64/arm64) support

Installation & Usage

Docker Deployment

# Quick start - no configuration needed
docker run -p 8000:8000 tmfrisinger/webcat:latest

# With environment variables
docker run -p 8000:8000 \
  -e SERPER_API_KEY=your_key \
  -e WEBCAT_API_KEY=your_token \
  tmfrisinger/webcat:latest

# Using docker-compose
cd docker
docker-compose up

Local Development

cd docker
python -m pip install -e ".[dev]"

# Configure environment (optional)
echo "SERPER_API_KEY=your_key" > .env

# Development mode with auto-reload
make dev        # Start MCP server with auto-reload

# Production mode
make mcp        # Start MCP server

Available Endpoints

Endpoint	Description
`http://localhost:8000/health`	💗 Health check
`http://localhost:8000/status`	📊 Server status
`http://localhost:8000/mcp`	🛠️ MCP protocol endpoint (Streamable HTTP with JSON-RPC 2.0)

Configuration

Environment Variables

Variable	Default	Description
`SERPER_API_KEY`	(none)	Serper API key for premium search (optional, falls back to DuckDuckGo if not set)
`PERPLEXITY_API_KEY`	(none)	Perplexity API key for deep research tool (optional, get at https://www.perplexity.ai/settings/api)
`WEBCAT_API_KEY`	(none)	Bearer token for authentication (optional, if set all requests must include `Authorization: Bearer <token>`)
`PORT`	`8000`	Server port
`LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR)
`LOG_DIR`	`/tmp`	Log file directory
`MAX_CONTENT_LENGTH`	`1000000`	Maximum characters to return per scraped article

Get API Keys

Serper API (for web search + scraping):

Visit serper.dev
Sign up for free tier (2,500 searches/month + scraping)
Copy your API key
Add to .env file: SERPER_API_KEY=your_key
Note: One API key enables both search AND content scraping!

Perplexity API (for deep research):

Visit perplexity.ai/settings/api
Sign up and get your API key
Copy your API key
Add to .env file: PERPLEXITY_API_KEY=your_key

Enable Authentication (Optional)

To require bearer token authentication for all MCP tool calls:

Generate a secure random token: openssl rand -hex 32
Add to .env file: WEBCAT_API_KEY=your_token
Include in all requests: Authorization: Bearer your_token

Note: If WEBCAT_API_KEY is not set, no authentication is required.

MCP Tools

WebCat exposes these tools via MCP:

Tool	Description	Parameters
`search`	Search web and extract content	`query: str`, `max_results: int`
`scrape_url`	Scrape specific URL	`url: str`
`health_check`	Check server health	(none)
`get_server_info`	Get server capabilities	(none)

Architecture

MCP Client (Claude, LiteLLM)
    ↓
FastMCP Server (Streamable HTTP with JSON-RPC 2.0)
    ↓
Authentication (optional bearer token)
    ↓
Search Decision
    ├─ Serper API (premium) → Serper Scrape API (premium)
    └─ DuckDuckGo (free)    → Trafilatura (free)
                                    ↓
                            Markdown Response

Tech Stack:

FastMCP - MCP protocol implementation with modern HTTP transport
JSON-RPC 2.0 - Standard protocol for client-server communication
Serper API - Google-powered search + optimized web scraping
Trafilatura - Fallback content extraction (removes navigation/ads)
DuckDuckGo - Free search fallback

Testing

cd docker

# Run all unit tests
make test
# OR
python -m pytest tests/unit -v

# With coverage report
make test-coverage
# OR
python -m pytest tests/unit --cov=. --cov-report=term --cov-report=html

# CI-safe tests (no external dependencies)
python -m pytest -v -m "not integration"

# Run specific test file
python -m pytest tests/unit/services/test_content_scraper.py -v

Current test coverage: 70%+ across all modules (enforced in CI)

Development

# First-time setup
make setup-dev   # Install all dependencies + pre-commit hooks

# Development workflow
make dev         # Start server with auto-reload
make format      # Auto-format code (Black + isort)
make lint        # Check code quality (flake8)
make test        # Run unit tests

# Before committing
make ci-fast     # Quick validation (~30 seconds)
# OR
make ci          # Full validation with security checks (~2-3 minutes)

# Code quality tools
make format-check   # Check formatting without changes
make security       # Run bandit security scanner
make audit          # Check dependency vulnerabilities

Pre-commit Hooks: Hooks run automatically on git commit to ensure code quality. Install with make setup-dev.

Project Structure

docker/
├── mcp_server.py          # Main MCP server (FastMCP)
├── cli.py                 # CLI interface for server modes
├── health.py              # Health check endpoint
├── api_tools.py           # API tooling utilities
├── clients/               # External API clients
│   ├── serper_client.py  # Serper API (search + scrape)
│   └── duckduckgo_client.py  # DuckDuckGo fallback
├── services/              # Core business logic
│   ├── search_service.py # Search orchestration
│   └── content_scraper.py # Serper scrape → Trafilatura fallback
├── tools/                 # MCP tool implementations
│   └── search_tool.py    # Search tool with auth
├── models/                # Pydantic data models
│   ├── domain/           # Domain entities (SearchResult, etc.)
│   └── responses/        # API response models
├── utils/                 # Shared utilities
│   └── auth.py           # Bearer token authentication
├── endpoints/             # FastAPI endpoints
├── tests/                 # Comprehensive test suite
│   ├── unit/             # Unit tests (mocked dependencies)
│   └── integration/      # Integration tests (external deps)
└── pyproject.toml         # Project config + dependencies

Search Quality Comparison

Feature	Serper API	DuckDuckGo
Cost	Paid (free tier available)	Free
Quality	⭐⭐⭐⭐⭐ Excellent	⭐⭐⭐⭐ Good
Coverage	Comprehensive (Google-powered)	Standard
Speed	Fast	Fast
Rate Limits	2,500/month (free tier)	None

Docker Multi-Platform Support

WebCat supports multiple architectures for broad deployment compatibility:

# Build locally for multiple platforms
cd docker
./build.sh  # Builds for linux/amd64 and linux/arm64

# Manual multi-platform build and push
docker buildx build --platform linux/amd64,linux/arm64 \
  -t tmfrisinger/webcat:2.3.2 \
  -t tmfrisinger/webcat:latest \
  -f Dockerfile --push .

# Verify multi-platform support
docker buildx imagetools inspect tmfrisinger/webcat:latest

Automated Releases: Push a version tag to trigger automated multi-platform builds via GitHub Actions:

git tag v2.3.2
git push origin v2.3.2

Limitations

Text-focused: Optimized for article content, not multimedia
No JavaScript: Cannot scrape dynamic JS-rendered content (uses static HTML)
PDF support: Detection only, not full extraction
Python 3.11 required: Not compatible with 3.10 or 3.12
External API limits: Subject to Serper API rate limits (2,500/month free tier)

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure make ci passes
Submit a Pull Request

See CLAUDE.md for development guidelines and architecture standards.

License

MIT License - see LICENSE file for details.

Quick Start

Docker (Recommended)

Local Development

What is WebCat?

Features

Installation & Usage

Docker Deployment

Local Development

Available Endpoints

Configuration

Environment Variables

Get API Keys

Enable Authentication (Optional)

MCP Tools

Architecture

Testing

Development

Project Structure

Search Quality Comparison

Docker Multi-Platform Support

Limitations

Contributing

License

Links