GitHub - anrgct/autodev-codebase: A vector embedding-based code semantic search tool with MCP server and multi-model integration. Can be used as a pure CLI tool. Supports Ollama for fully local embedding and reranking, enabling complete offline operation and privacy protection for your code repository

A vector embedding-based code semantic search tool with MCP server and multi-model integration. Can be used as a pure CLI tool. Supports Ollama for fully local embedding and reranking, enabling complete offline operation and privacy protection for your code repository.

# Semantic code search - Find code by meaning, not just keywords
╭─ ~/workspace/autodev-codebase 
╰─❯ codebase search "user manage" --demo
Found 20 results in 5 files for: "user manage"

==================================================
File: "hello.js"
==================================================
< class UserManager > (L7-20)
class UserManager {
  constructor() {
    this.users = [];
  }

  addUser(user) {
    this.users.push(user);
    console.log('User added:', user.name);
  }

  getUsers() {
    return this.users;
  }
}
……

# Call graph analysis - Trace function call relationships and execution paths
╭─ ~/workspace/autodev-codebase 
╰─❯ codebase call --demo --query="app,addUser"
Connections between app, addUser:

Found 2 matching node(s):
  - demo/app:L1-29
  - demo/hello.UserManager.addUser:L12-15

Direct connections:
  - demo/app:L1-29 → demo/hello.UserManager.addUser:L12-15

Chains found:
  - demo/app:L1-29 → demo/hello.UserManager.addUser:L12-15

# Code outline with AI summaries - Understand code structure at a glance
╭─ ~/workspace/autodev-codebase 
╰─❯ codebase outline 'hello.js' --demo --summarize
# hello.js (23 lines)
└─ Defines a greeting function that logs a personalized hello message and returns a welcome string. Implements a UserManager class managing an array of users with methods to add users and retrieve the current user list. Exports both components for external use.

   2--5 | function greetUser
   └─ Implements user greeting logic by logging a personalized hello message and returning a welcome message

   7--20 | class UserManager
   └─ Manages user data with methods to add users to a list and retrieve all stored users

   12--15 | method addUser
   └─ Adds a user to the users array and logs a confirmation message with the user's name.

🚀 Features

🔍 Semantic Code Search: Vector-based search using advanced embedding models
🔗 Call Graph Analysis: Trace function call relationships and execution paths
🌐 MCP Server: HTTP-based MCP server with SSE and stdio adapters
💻 Pure CLI Tool: Standalone command-line interface without GUI dependencies
⚙️ Layered Configuration: CLI, project, and global config management
🎯 Advanced Path Filtering: Glob patterns with brace expansion and exclusions
🌲 Tree-sitter Parsing: Support for 40+ programming languages
💾 Qdrant Integration: High-performance vector database
🔄 Multiple Providers: OpenAI, Ollama, Jina, Gemini, Mistral, OpenRouter, Vercel
📊 Real-time Watching: Automatic index updates
⚡ Batch Processing: Efficient parallel processing
📝 Code Outline Extraction: Generate structured code outlines with AI summaries
💨 Dependency Analysis Cache: Intelligent caching for 10-50x faster re-analysis

📦 Installation

1. Dependencies

brew install ollama ripgrep
ollama serve
ollama pull nomic-embed-text

2. Qdrant

docker run -d -p 6333:6333 -p 6334:6334 --name qdrant qdrant/qdrant

3. Install

npm install -g @autodev/codebase
codebase config --set embedderProvider=ollama,embedderModelId=nomic-embed-text

🛠️ Quick Start

# Demo mode (recommended for first-time)
# Creates a demo directory in current working directory for testing

# Index & search
codebase index --demo
codebase search "user greet" --demo

# Call graph analysis
codebase call --demo --query="app,addUser"

# MCP server
codebase index --serve --demo

📋 Commands

📝 Code Outlines

# Extract code structure (functions, classes, methods)
codebase outline "src/**/*.ts"

# Generate code structure with AI summaries
codebase outline "src/**/*.ts" --summarize

# View only file-level summaries
codebase outline "src/**/*.ts" --summarize --title

# Clear summary cache
codebase outline --clear-summarize-cache

🔗 Call Graph Analysis

# 📊 Statistics Overview (no --query)
codebase call                           # Show statistics overview
codebase call --json                    # JSON format
codebase call src/commands              # Analyze specific directory

# 🔍 Function Query (with --query)
codebase call --query="getUser"         # Single function call tree (default depth: 3)
codebase call --query="main" --depth=5  # Custom depth
codebase call --query="getUser,validateUser"  # Multi-function connections (default depth: 10)

# 🎨 Visualization
codebase call --viz graph.json          # Export Cytoscape.js format
codebase call --open                    # Open interactive viewer
codebase call --viz graph.json --open   # Export and open

# Specify workspace (works for both modes)
codebase call --path=/my/project --query="main"

Query Patterns:

Exact match: --query="functionName" or --query="*ClassName.methodName"
Wildcards: * (any characters), ? (single character)
- Examples: --query="get*", --query="*User*", --query="*.*.get*"
Single function: --query="main" - Shows call tree (upward + downward)
- Default depth: 3 (avoids excessive output)
Multiple functions: --query="main,helper" - Analyzes connection paths between functions
- Default depth: 10 (deeper search needed for path finding)

Supported Languages:

TypeScript/JavaScript (.ts, .tsx, .js, .jsx)
Python (.py)
Java (.java)
C/C++ (.c, .h, .cpp, .cc, .cxx, .hpp, .hxx, .c++)
C# (.cs)
Rust (.rs)
Go (.go)

🔍 Indexing & Search

# Index the codebase
codebase index --path=/my/project --force

# Search with filters
codebase search "error handling" --path-filters="src/**/*.ts"

# Search with custom limit and minimum score
codebase search "authentication" --limit=20 --min-score=0.7
codebase search "API" -l 30 -S 0.5

# Search in JSON format
codebase search "authentication" --json

# Clear index data
codebase index --clear-cache --path=/my/project

🌐 MCP Server

# HTTP mode (recommended)
codebase index --serve --port=3001 --path=/my/project

# Stdio adapter
codebase stdio --server-url=http://localhost:3001/mcp

⚙️ Configuration

# View config
codebase config --get
codebase config --get embedderProvider --json

# Set config
codebase config --set embedderProvider=ollama,embedderModelId=nomic-embed-text
codebase config --set --global qdrantUrl=http://localhost:6333

🚀 Advanced Features

🔍 LLM-Powered Search Reranking

Enable LLM reranking to dramatically improve search relevance:

# Enable reranking with Ollama (recommended)
codebase config --set rerankerEnabled=true,rerankerProvider=ollama,rerankerOllamaModelId=qwen3-vl:4b-instruct

# Or use OpenAI-compatible providers
codebase config --set rerankerEnabled=true,rerankerProvider=openai-compatible,rerankerOpenAiCompatibleModelId=deepseek-chat

# Search with automatic reranking
codebase search "user authentication"  # Results are automatically reranked by LLM

Benefits:

🎯 Higher precision: LLM understands semantic relevance beyond vector similarity
📊 Smart scoring: Results are reranked on a 0-10 scale based on query relevance
⚡ Batch processing: Efficiently handles large result sets with configurable batch sizes
🎛️ Threshold control: Filter results with rerankerMinScore to keep only high-quality matches

Path Filtering & Export

# Path filtering with brace expansion and exclusions
codebase search "API" --path-filters="src/**/*.ts,lib/**/*.js"
codebase search "utils" --path-filters="{src,test}/**/*.ts"

# Export results in JSON format for scripts
codebase search "auth" --json

Path Filtering & Export

# Path filtering with brace expansion and exclusions
codebase search "API" --path-filters="src/**/*.ts,lib/**/*.js"
codebase search "utils" --path-filters="{src,test}/**/*.ts"

# Export results in JSON format for scripts
codebase search "auth" --json

⚙️ Configuration

Config Layers (Priority Order)

CLI Arguments - Runtime parameters (--path, --config, --log-level, --force, etc.)
Project Config - ./autodev-config.json (or custom path via --config)
Global Config - ~/.autodev-cache/autodev-config.json
Built-in Defaults - Fallback values

Note: CLI arguments provide runtime override for paths, logging, and operational behavior. For persistent configuration (embedderProvider, API keys, search parameters), use config --set to save to config files.

Common Config Examples

Ollama:

{
  "embedderProvider": "ollama",
  "embedderModelId": "nomic-embed-text",
  "qdrantUrl": "http://localhost:6333"
}

OpenAI:

{
  "embedderProvider": "openai",
  "embedderModelId": "text-embedding-3-small",
  "embedderOpenAiApiKey": "sk-your-key",
  "qdrantUrl": "http://localhost:6333"
}

OpenAI-Compatible:

{
  "embedderProvider": "openai-compatible",
  "embedderModelId": "text-embedding-3-small",
  "embedderOpenAiCompatibleApiKey": "sk-your-key",
  "embedderOpenAiCompatibleBaseUrl": "https://api.openai.com/v1"
}

Key Configuration Options

Category	Options	Description
Embedding	`embedderProvider`, `embedderModelId`, `embedderModelDimension`	Provider and model settings
API Keys	`embedderOpenAiApiKey`, `embedderOpenAiCompatibleApiKey`	Authentication
Vector Store	`qdrantUrl`, `qdrantApiKey`	Qdrant connection
Search	`vectorSearchMinScore`, `vectorSearchMaxResults`	Search behavior
Reranker	`rerankerEnabled`, `rerankerProvider`	Result reranking
Summarizer	`summarizerProvider`, `summarizerLanguage`, `summarizerBatchSize`	AI summary generation

Key CLI Arguments:

index - Index the codebase
search <query> - Search the codebase (required positional argument)
outline <pattern> - Extract code outlines (supports glob patterns)
call - Analyze function call relationships and dependency graphs
stdio - Start stdio adapter for MCP
config - Manage configuration (use with --get or --set)
--serve - Start MCP HTTP server (use with index command)
--summarize - Generate AI summaries for code outlines
--dry-run - Preview operations before execution
--title - Show only file-level summaries
--clear-summarize-cache - Clear all summary caches
--path, --demo, --force - Common options
--limit / -l <number> - Maximum number of search results (default: from config, max 50)
--min-score / -S <number> - Minimum similarity score for search results (0-1, default: from config)
--query <patterns> - Query patterns for call graph analysis (comma-separated)
--viz <file> - Export full dependency data for visualization (cannot use with --query)
--open - Open interactive graph viewer
--depth <number> - Set analysis depth for call graphs
--help - Show all available options

Configuration Commands:

# View config
codebase config --get
codebase config --get --json

# Set config (saves to file)
codebase config --set embedderProvider=ollama,embedderModelId=nomic-embed-text
codebase config --set --global embedderProvider=openai,embedderOpenAiApiKey=sk-xxx

# Use custom config file
codebase --config=/path/to/config.json config --get
codebase --config=/path/to/config.json config --set embedderProvider=ollama

# Runtime override (paths, logging, etc.)
codebase index --path=/my/project --log-level=info --force

For complete configuration reference, see CONFIG.md.

🔌 MCP Integration

HTTP Streamable Mode (Recommended)

codebase index --serve --port=3001

IDE Config:

{
  "mcpServers": {
    "codebase": {
      "url": "http://localhost:3001/mcp"
    }
  }
}

Stdio Adapter

# First start the MCP server in one terminal
codebase index --serve --port=3001

# Then connect via stdio adapter in another terminal (for IDEs that require stdio)
codebase stdio --server-url=http://localhost:3001/mcp

IDE Config:

{
  "mcpServers": {
    "codebase": {
      "command": "codebase",
      "args": ["stdio", "--server-url=http://localhost:3001/mcp"]
    }
  }
}

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue on GitHub.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

This project is a fork and derivative work based on Roo Code. We've built upon their excellent foundation to create this specialized codebase analysis tool with enhanced features and MCP server capabilities.

🌟 If you find this tool helpful, please give us a star on GitHub!

Made with ❤️ for the developer community