Turn code repositories into AI-friendly Markdown documentation
Codefetch is a powerful tool that converts git repositories and local codebases into structured Markdown files optimized for Large Language Models (LLMs). It intelligently collects, processes, and formats code while respecting ignore patterns and providing token counting for various AI models.
Key Features
- 📁 Local Codebase Processing - Convert entire codebases into AI-friendly Markdown
- 🔗 Git Repository Support - Fast GitHub API fetching with git clone fallback
- 🐙 Multi-Platform Support - Works with GitHub, GitLab, and Bitbucket
- 🎯 Smart Filtering - Respect .gitignore patterns and custom exclusions
- 📊 Token Counting - Track tokens for GPT-4, Claude, and other models
- 🚀 CLI & SDK - Use via command line or integrate programmatically
- 💾 Intelligent Caching - Speed up repeated fetches with smart caching
- 🌲 Project Structure Visualization - Generate tree views of your codebase
- ⚡ GitHub API Integration - Fetch repos without git using the GitHub API
- 🌐 Open & Send to GPT 5.1 Pro/Gemini 3.0/Claude - Generate codebase, copy to clipboard, and open ChatGPT, Gemini, or Claude in browser
Click here for a Demo & Videos
Quick Start
# Analyze current directory npx codefetch # Generate codebase, copy to clipboard, and open ChatGPT with GPT 5.1 Pro npx codefetch open # Analyze a GitHub repo (uses API - no git needed!) npx codefetch --url github.com/facebook/react # Analyze from GitLab or Bitbucket npx codefetch --url gitlab.com/gitlab-org/gitlab
Installation
Using npx (recommended)
Global Installation
npm install -g codefetch codefetch --help
In Your Project
npm install --save-dev codefetch
# Add to package.json scriptsBasic Examples
Local Codebase
# Basic usage - outputs to codefetch/codebase.md npx codefetch # Include only TypeScript files with tree view npx codefetch -e ts,tsx -t 3 # Generate with AI prompt template npx codefetch -p improve --max-tokens 50000
Open & Send to GPT/Gemini/Claude
The open command generates your codebase, copies it to clipboard, and opens ChatGPT, Gemini, Claude, or other AI chats in the browser:
# Default: opens ChatGPT with GPT 5.1 Pro model npx codefetch open # Open Gemini 3.0 npx codefetch open --chat-url gemini.google.com --chat-model gemini-3.0 # Open Claude Sonnet npx codefetch open --chat-url claude.ai --chat-model claude-3.5-sonnet # Combine with codefetch options (e.g., filter by extension) npx codefetch open -e .ts,.js --exclude-dir node_modules # Just copy to clipboard without opening browser npx codefetch open --no-browser # Include a prompt in the clipboard content npx codefetch open --prompt "Review this codebase for security issues"
Git Repository Fetching
# Analyze a GitHub repository (uses GitHub API by default - no git required!) npx codefetch --url github.com/vuejs/vue --branch main -e js,ts # Private repository with token npx codefetch --url github.com/myorg/private-repo --github-token ghp_xxxxx # Force git clone instead of API npx codefetch --url github.com/user/repo --no-api # Web crawling with advanced options npx codefetch --url example.com/docs --max-pages 50 --max-depth 3
Include or exclude specific files and directories:
# Exclude node_modules and public directories npx codefetch --exclude-dir test,public # Include only TypeScript files npx codefetch --include-files "*.ts" -o typescript-only.md # Include specific files and directories with glob patterns npx codefetch --include-files "src/components/AgentPanel.tsx,src/lib/llm/**/*" -o selected-files.md # Include src directory, exclude test files npx codefetch --include-dir src --exclude-files "*.test.ts" -o src-no-tests.md # Combine --include-dir and --include-files (additive!) # This includes ALL files from crates/core/src PLUS the specific lib.rs file npx codefetch --include-dir crates/core/src --include-files "crates/engine/src/lib.rs" -o combined.md
Dry run (only output to console)
Count tokens only (without generating markdown file)
# Count tokens with default encoder npx codefetch -c # Count tokens with specific encoder npx codefetch -c --token-encoder cl100k # Count tokens for specific file types npx codefetch -c -e .ts,.js --token-encoder o200k
If no output file is specified (-o or --output), it will print to codefetch/codebase.md
Options
General Options
| Option | Description |
|---|---|
-o, --output <file> |
Specify output filename (defaults to codebase.md). Note: If you include "codefetch/" in the path, it will be automatically stripped to avoid double-nesting |
--dir <path> |
Specify the directory to scan (defaults to current directory) |
--max-tokens <number> |
Limit output tokens (default: 500,000) |
-e, --extension <ext,...> |
Filter by file extensions (e.g., .ts,.js) |
--token-limiter <type> |
Token limiting strategy when using --max-tokens (sequential, truncated) |
--include-files <pattern,...> |
Include specific files (supports patterns like *.ts) |
--exclude-files <pattern,...> |
Exclude specific files (supports patterns like *.test.ts) |
--include-dir <dir,...> |
Include specific directories |
--exclude-dir <dir,...> |
Exclude specific directories |
-v, --verbose [level] |
Show processing information (0=none, 1=basic, 2=debug) |
-t, --project-tree [depth] |
Generate visual project tree (optional depth, default: 2). Respects .gitignore, .codefetchignore, and config filters by default |
--project-tree-skip-ignore-files |
Include files ignored by git/config in the project tree output |
--token-encoder <type> |
Token encoding method (simple, p50k, o200k, cl100k) |
--enable-line-numbers |
Enable line numbers in output (disabled by default to save tokens) |
--exclude-markdown |
Exclude markdown files (*.md, *.markdown, *.mdx) from output |
-p, --prompt <text> |
Add a prompt: built-in (fix, improve, codegen, testgen), file (.md/.txt), or inline text |
-d, --dry-run |
Output markdown to stdout instead of file |
-c, --token-count-only |
Output only the token count without generating markdown file |
Git Repository Options
| Option | Description |
|---|---|
--url <URL> |
Fetch and analyze content from a git repository or website URL |
--branch <name> |
Git branch/tag/commit to fetch (for repositories) |
--no-cache |
Skip cache and fetch fresh content |
--cache-ttl <hours> |
Cache time-to-live in hours (default: 1) |
--no-api |
Disable GitHub API and use git clone instead |
--github-token <token> |
GitHub API token for private repos (or set GITHUB_TOKEN env var) |
Web Crawling Options
| Option | Description |
|---|---|
--max-pages <number> |
Maximum pages to crawl (default: 50) |
--max-depth <number> |
Maximum crawl depth (default: 2) |
--ignore-robots |
Ignore robots.txt restrictions |
--ignore-cors |
Ignore CORS restrictions |
Open Command Options
The open subcommand generates codebase, copies to clipboard, and opens ChatGPT, Gemini, Claude, or other AI chats in browser:
| Option | Description |
|---|---|
--chat-url <url> |
AI chat URL (default: chatgpt.com) - supports ChatGPT, Gemini, Claude, etc. |
--chat-model <model> |
Model parameter for URL (default: gpt-5-1-pro) - e.g., gpt-5-1-pro, gemini-3.0, claude-3.5-sonnet |
--chat-prompt <text> |
Short hint added to URL parameter and shown in console (limited length due to URL constraints) |
--no-browser |
Skip opening browser, just copy to clipboard |
--copy |
Copy output to clipboard (works on macOS, Windows, and Linux) |
All standard codefetch options are supported with open (e.g., -e, -t, --exclude-dir, --prompt).
All options that accept multiple values use comma-separated lists. File patterns support glob wildcards:
*matches any number of characters (except path separators)**matches any number of directories recursively?matches a single character- Patterns can include directory paths (e.g.,
src/lib/**/*.ts)
Project Tree
You can generate a visual tree representation of your project structure. By default, the project tree respects .gitignore, .codefetchignore, and config filters, showing only files that would be included in the codebase analysis:
# Generate tree with default depth (2 levels) # Only shows files not ignored by .gitignore, .codefetchignore, or config npx codefetch --project-tree # Generate tree with custom depth npx codefetch -t 3 # Generate tree and save code to file npx codefetch -t 2 -o output.md # Include ignored files/folders as well (shows full directory structure) npx codefetch -t 2 --project-tree-skip-ignore-files
Example output:
Project Tree:
└── my-project
├── src
│ ├── index.ts
│ ├── types.ts
│ └── utils
├── tests
│ └── index.test.ts
└── package.json
By default, the project tree hides anything excluded via .gitignore, .codefetchignore, or your include/exclude config settings. Use --project-tree-skip-ignore-files when you need to inspect the entire directory structure regardless of those rules.
Using Prompts
You can add predefined, custom, or inline prompts to your output:
# Use inline prompts (NEW!) npx codefetch -p "Review this code for security issues" npx codefetch --prompt "Refactor to use async/await" # Use built-in prompts npx codefetch -p fix # fixes codebase npx codefetch -p improve # improves codebase npx codefetch -p codegen # generates code npx codefetch -p testgen # generates tests # Use built-in prompts with a message npx codefetch -p fix "Fix the authentication bug" # Use custom prompt files npx codefetch --prompt custom-prompt.md npx codefetch -p my-architect.txt
Inline Prompts
The simplest way to add a prompt is to pass it directly:
npx codefetch -p "Analyze this codebase and suggest improvements" npx codefetch --prompt "Find potential memory leaks"
Inline prompts are automatically appended with the codebase content.
Custom Prompt Files
You can use custom prompt files in two ways:
1. External prompt files (anywhere in your project):
# Use a prompt file from anywhere in your project
npx codefetch -p docs/arch/review-prompt.md
npx codefetch --prompt ./prompts/security-audit.txt2. Prompt files in codefetch/prompts/ directory:
# Create codefetch/prompts/my-prompt.md, then use:
npx codefetch --prompt my-prompt.mdYou can also set a default prompt in your codefetch.config.mjs:
export default { defaultPromptFile: "dev", // Use built-in prompt } export default { defaultPromptFile: "custom-prompt.md", // Use custom prompt file }
The prompt resolution order is:
- CLI argument (
-por--prompt) - can be inline text, built-in name, or file path - Config file prompt setting
- No prompt if neither is specified
Token Limiting Strategies
When using --max-tokens, you can control how tokens are distributed across files using the --token-limiter option:
# Sequential mode - process files in order until reaching token limit npx codefetch --max-tokens 500 --token-limiter sequential # Truncated mode (default) - distribute tokens evenly across all files npx codefetch --max-tokens 500 --token-limiter truncated
sequential: Processes files in order until the total token limit is reached. Useful when you want complete content from the first files.truncated: Distributes tokens evenly across all files, showing partial content from each file. This is the default mode and is useful for getting an overview of the entire codebase.
Ignoring Files
codefetch supports two ways to ignore files:
.gitignore- Respects your project's existing.gitignorepatterns.codefetchignore- Additional patterns specific to codefetch
The .codefetchignore file works exactly like .gitignore and is useful when you want to ignore files that aren't in your .gitignore.
Default Ignore Patterns
Codefetch uses a set of default ignore patterns to exclude common files and directories that typically don't need to be included in code reviews or LLM analysis.
You can view the complete list of default patterns in default-ignore.ts.
Output Format
Codefetch generates structured output using semantic XML tags to help AI models better understand the different sections:
<task> Your prompt or instructions here... </task> <filetree> Project Structure: └── src ├── index.ts └── utils └── helpers.ts </filetree> <source_code> src/index.ts ```typescript // Your code here
src/utils/helpers.ts
</source_code>
The XML structure provides:
- `<task>` - Contains your prompt/instructions (from `-p` flag)
- `<filetree>` - Contains the project tree visualization (from `-t` flag)
- `<source_code>` - Contains all the source code files with their paths
## Token Counting
Codefetch supports different token counting methods to match various AI models:
- `simple`: Basic word-based estimation (not very accurate but fastest!)
- `p50k`: GPT-3 style tokenization
- `o200k`: gpt-4o style tokenization
- `cl100k`: GPT-4 style tokenization
Select the appropriate encoder based on your target model:
```bash
# For GPT-4o
npx codefetch --token-encoder o200k
Output Directory
By default (unless using --dry-run) codefetch will:
- Create a
codefetch/directory in your project - Store all output files in this directory
This ensures that:
- Your fetched code is organized in one place
- The output directory won't be fetched so we avoid fetching the codebase again
Add codefetch/ to your .gitignore file to avoid committing the fetched codebase.
Use with GPT, Gemini, Claude, and Other AI Models
You can use this command to create code-to-markdown in bolt.new, cursor.com, ... and ask GPT 5.1 Pro, Gemini 3.0, Claude, or other AI models for guidance about your codebase.
Quick Workflow: Open & Send to GPT 5.1 Pro, Gemini 3.0, or Claude
The fastest way to get your codebase into ChatGPT, Gemini, Claude, or other AI chats:
# Generate codebase, copy to clipboard, and open ChatGPT with GPT 5.1 Pro npx codefetch open # Open Gemini 3.0 instead npx codefetch open --chat-url gemini.google.com --chat-model gemini-3.0 # Open Claude Sonnet npx codefetch open --chat-url claude.ai --chat-model claude-3.5-sonnet # Your codebase is automatically copied to clipboard # The AI chat opens with the model pre-selected (GPT 5.1 Pro, Gemini 3.0, etc.) # Just paste (Cmd/Ctrl+V) and start chatting!
This workflow:
- ✅ Generates your codebase as markdown
- ✅ Copies it to your clipboard automatically
- ✅ Opens ChatGPT (GPT 5.1 Pro), Gemini 3.0, Claude, or your preferred AI chat in your browser
- ✅ Pre-fills the model parameter in the URL
Perfect for quick code reviews, refactoring suggestions, or getting help with your codebase from GPT 5.1 Pro, Gemini 3.0, Claude, and other leading AI models!
Packages
Codefetch is organized as a monorepo with multiple packages:
codefetch
Command-line interface for Codefetch with web fetching capabilities.
Features:
- Full CLI with all options
- Website crawling and conversion
- Git repository cloning
- Built-in caching system
- Progress reporting
Read the full CLI documentation →
codefetch-sdk
Core SDK for programmatic usage in your applications.
npm install codefetch-sdk@latest
Features:
- 🎯 Unified
fetch()API - Single method for all sources - 🚀 Zero-config defaults - Works out of the box
- 📦 Optimized bundle - Small footprint for edge environments
- 🔧 Full TypeScript support - Complete type safety
- 🌐 Enhanced web support - GitHub API integration
Quick Start:
import { fetch } from 'codefetch-sdk'; // Local codebase const result = await fetch({ source: './src', extensions: ['.ts', '.tsx'], maxTokens: 50_000, }); // GitHub repository const result = await fetch({ source: 'https://github.com/facebook/react', branch: 'main', extensions: ['.js', '.ts'], }); console.log(result.markdown); // AI-ready markdown
Read the full SDK documentation →
codefetch-sdk/worker
Cloudflare Workers optimized build - Zero file system dependencies.
import { fetch } from 'codefetch-sdk/worker'; export default { async fetch(request: Request, env: Env): Promise<Response> { const result = await fetch({ source: 'https://github.com/vercel/next.js', maxFiles: 50, extensions: ['.ts', '.tsx'], }); return new Response(result.markdown, { headers: { 'Content-Type': 'text/markdown' } }); } };
Features:
- 🚀 Zero nodejs_compat required - Uses native Web APIs
- 📦 35.4KB bundle size - Optimized for edge performance
- 🔒 Private repo support - GitHub token authentication
- 🌊 Native streaming - Memory-efficient processing
Read the full Worker documentation →
codefetch-mcp-server (Coming soon)
Model Context Protocol server for AI assistants like Claude.
Features:
- MCP tools for codebase analysis
- Direct integration with Claude Desktop
- Token counting tools
- Configurable via environment variables
Read the full MCP documentation →
Integrate into Your Project
Initialize your project with codefetch:
This will:
- Create a
.codefetchignorefile for excluding files - Generate a
codefetch.config.mjswith your preferences - Set up the project structure
Configuration
Create a .codefetchrc file in your project root:
{
"extensions": [".ts", ".tsx", ".js", ".jsx"],
"excludeDirs": ["node_modules", "dist", "coverage"],
"maxTokens": 100000,
"outputFile": "codebase.md",
"tokenEncoder": "cl100k"
}Or use codefetch.config.mjs for more control:
export default { // Output settings outputPath: "codefetch", outputFile: "codebase.md", maxTokens: 999_000, // Processing options projectTree: 2, // Tree depth (0 to disable) projectTreeSkipIgnoreFiles: false, // Set to true to show ignored files in tree tokenEncoder: "cl100k", tokenLimiter: "truncated", // File filtering extensions: [".ts", ".js"], excludeDirs: ["test", "dist"], // AI/LLM settings trackedModels: ["gpt-4", "claude-3-opus", "gpt-3.5-turbo"], };
Links
- X/Twitter: @kregenrek
- Bluesky: @kevinkern.dev
Courses
- Learn Cursor AI: Ultimate Cursor Course
- Learn to build software with AI: AI Builder Hub
See my other projects:
- aidex - AI model information
Credits
This project was inspired by

