The universal quality standard for AI agent skills.
Evaluate any SKILL.md โ from skills.sh, ClawHub, GitHub, or your local machine.
โจ Features
- ๐ฏ Comprehensive Evaluation: 8 scoring categories with weighted importance
- ๐จ Multiple Output Formats: Terminal (colorful), JSON, and Markdown reports
- ๐ Deterministic Analysis: Reliable, reproducible scoring without requiring API keys
- ๐ Detailed Feedback: Specific findings and actionable recommendations
- โก Fast & Reliable: Built with TypeScript for speed and reliability
- ๐ Cross-Platform: Works on Windows, macOS, and Linux
- ๐ GitHub Integration: Score skills directly from GitHub repositories
- ๐ Batch Mode: Compare multiple skills with a summary table
- ๐ฃ๏ธ Verbose Mode: See all findings, not just truncated summaries
๐ฆ Installation
Global Installation (Recommended)
npm install -g skillscore
Local Installation
npm install skillscore npx skillscore ./my-skill/
From Source
git clone https://github.com/joeynyc/skillscore.git
cd skillscore
npm install
npm run build
npm link๐ Quick Start
Evaluate a skill directory:
๐ Usage Examples
Basic Usage
# Evaluate a skill skillscore ./skills/my-skill/ # Evaluate with verbose output (shows all findings) skillscore ./skills/my-skill/ --verbose
GitHub Integration
# Full GitHub URL (always recognized) skillscore https://github.com/vercel-labs/skills/tree/main/skills/find-skills # GitHub shorthand (requires -g/--github flag) skillscore -g vercel-labs/skills/find-skills # Anthropic skills skillscore -g anthropic/skills/skill-creator
Output Formats
# JSON output skillscore ./skills/my-skill/ --json # Markdown report skillscore ./skills/my-skill/ --markdown # Save to file skillscore ./skills/my-skill/ --output report.md skillscore ./skills/my-skill/ --json --output score.json
Batch Mode
# Compare multiple skills (auto-enters batch mode) skillscore ./skill1 ./skill2 ./skill3 # Explicit batch mode flag skillscore ./skill1 ./skill2 --batch # Compare GitHub skills skillscore -g user/repo1/skill1 user/repo2/skill2 --json
Utility Commands
# Show version skillscore --version # Get help skillscore --help
๐ Example Output
Terminal Output
๐ SKILLSCORE EVALUATION REPORT
============================================================
๐ Skill: Weather Information Fetcher
Fetches current weather data for any city using OpenWeatherMap API
Path: ./weather-skill
๐ฏ OVERALL SCORE
A- - 92.0% (9.2/10.0 points)
๐ CATEGORY BREAKDOWN
------------------------------------------------------------
Structure โโโโโโโโโโโโโโโโโโโโ 100.0%
SKILL.md exists, clear name/description, follows conventions
Score: 10/10 (weight: 15%)
โ SKILL.md file exists (+3)
โ Clear skill name: "Weather Information Fetcher" (+2)
โ Clear description provided (+2)
... 2 more findings
Clarity โโโโโโโโโโโโโโโโโโโโ 90.0%
Specific actionable instructions, no ambiguity, logical order
Score: 9/10 (weight: 20%)
โ Contains specific step-by-step instructions with commands (+3)
โ No ambiguous language detected (+3)
โ Instructions follow logical order (+2)
... 1 more finding (use --verbose to see all)
Safety โโโโโโโโโโโโโโโโโโโโ 70.0%
No destructive commands, respects permissions
Score: 7/10 (weight: 20%)
โ No dangerous destructive commands found (+3)
โ No obvious secret exfiltration risks (+3)
โ Some potential security concerns detected
๐ SUMMARY
------------------------------------------------------------
โ
Strengths: Structure, Clarity, Dependencies, Documentation
โ Areas for improvement: Safety
Generated: 2/11/2026, 3:15:49 PM
Batch Mode Output
๐ BATCH SKILL EVALUATION
Evaluating 3 skill(s)...
[1/3] Processing: ./weather-skill
โ
Completed
[2/3] Processing: ./file-backup
โ
Completed
[3/3] Processing: user/repo/skill
โ
Completed
๐ COMPARISON SUMMARY
Skill Grade Score Structure Clarity Safety Status
Weather Information Fetcher A- 92.0% 100% 90% 70% OK
File Backup Tool B+ 87.0% 95% 85% 90% OK
Advanced Data Processor A 94.0% 100% 95% 85% OK
๐ BATCH SUMMARY
โ
Successful: 3
๐ Average Score: 91.0%
๐ Scoring System
SkillScore evaluates skills across 8 weighted categories:
| Category | Weight | Description |
|---|---|---|
| Structure | 15% | SKILL.md exists, clear name/description, file organization, artifact output spec |
| Clarity | 20% | Specific actionable instructions, no ambiguity, logical order |
| Safety | 20% | No destructive commands, respects permissions, network containment |
| Dependencies | 10% | Lists required tools/APIs, install instructions, env vars |
| Error Handling | 10% | Failure instructions, fallbacks, no silent failures |
| Scope | 10% | Single responsibility, routing quality, negative examples |
| Documentation | 10% | Usage examples, embedded templates, expected I/O |
| Portability | 5% | Cross-platform, no hardcoded paths, relative paths |
Scoring Methodology
Each category is scored from 0-10 points based on specific criteria:
- Structure: Checks for SKILL.md existence, clear naming, proper organization, and whether outputs/artifacts are defined
- Clarity: Analyzes instruction specificity, ambiguity, logical flow
- Safety: Scans for destructive commands, security risks, permission issues, and network containment (does the skill scope network access when using HTTP/APIs?)
- Dependencies: Validates tool listings, installation instructions, environment setup
- Error Handling: Reviews error scenarios, fallback strategies, validation
- Scope: Assesses single responsibility, trigger clarity, conflict potential, negative routing examples ("don't use when..."), and routing quality (concrete signals vs vague descriptions)
- Documentation: Evaluates examples, I/O documentation, troubleshooting guides, and embedded templates/worked examples with expected output
- Portability: Checks cross-platform compatibility, path handling, limitations
v1.1.0: Production-Validated Checks
Five new sub-criteria added in v1.1.0, inspired by OpenAI's Skills + Shell + Compaction blog and production data from Glean:
| Check | Category | Points | Why It Matters |
|---|---|---|---|
| Negative routing examples | Scope | 2 | Skills that say when NOT to use them trigger ~20% more accurately (Glean data) |
| Routing quality | Scope | 1 | Descriptions with concrete tool names, I/O, and "use when" patterns route better than marketing copy |
| Embedded templates | Documentation | 2 | Real output templates inside the skill drove the biggest quality + latency gains in production |
| Network containment | Safety | 1 | Skills combining tools + open network access are a data exfiltration risk without scoping |
| Artifact output spec | Structure | 1 | Skills that define where outputs go create clean review boundaries |
Grade Scale
| Grade | Score Range | Description |
|---|---|---|
| A+ | 97-100% | Exceptional quality |
| A | 93-96% | Excellent |
| A- | 90-92% | Very good |
| B+ | 87-89% | Good |
| B | 83-86% | Above average |
| B- | 80-82% | Satisfactory |
| C+ | 77-79% | Acceptable |
| C | 73-76% | Fair |
| C- | 70-72% | Needs improvement |
| D+ | 67-69% | Poor |
| D | 65-66% | Very poor |
| D- | 60-64% | Failing |
| F | 0-59% | Unacceptable |
๐ What Makes a Good Skill?
Required Structure
my-skill/
โโโ SKILL.md # Main skill definition (REQUIRED)
โโโ README.md # Documentation (recommended)
โโโ package.json # Dependencies (if applicable)
โโโ scripts/ # Executable scripts
โ โโโ setup.sh
โ โโโ main.py
โโโ examples/ # Usage examples
โโโ example.md
SKILL.md Template
# My Awesome Skill Brief description of what this skill does and when to use it. ## When to Use Use this skill when you need to [specific task] with [specific tools/inputs]. ## When NOT to Use Don't use this skill when: - The task is [alternative scenario] โ use [other skill] instead - You need [different capability] ## Dependencies - Tool 1: Installation instructions - API Key: How to obtain and configure - Environment: OS requirements ## Usage 1. Step-by-step instructions 2. Specific commands to run 3. Expected outputs ## Output Results are written to `./output/` as JSON files. ## Error Handling - Common issues and solutions - Fallback strategies - Validation steps ## Examples ### Example Output ```json { "status": "success", "result": "Example of what the skill produces" }
# Working example ./scripts/main.py --input "test data"
Limitations
- Known constraints
- Platform-specific notes
- Edge cases
## ๐ง API Usage
Use SkillScore programmatically in your Node.js projects:
```typescript
import { SkillParser, SkillScorer, TerminalReporter } from 'skillscore';
import type { Reporter, SkillScore } from 'skillscore';
const parser = new SkillParser();
const scorer = new SkillScorer();
const reporter: Reporter = new TerminalReporter();
async function evaluateSkill(skillPath: string): Promise<SkillScore> {
const skill = await parser.parseSkill(skillPath);
const score = await scorer.scoreSkill(skill);
const report = reporter.generateReport(score);
console.log(report);
return score;
}
All three reporters (TerminalReporter, JsonReporter, MarkdownReporter) implement the Reporter interface.
๐ ๏ธ CLI Options
Usage: skillscore [options] <path...>
Arguments:
path Path(s) to skill directory, GitHub URL, or shorthand
Options:
-V, --version Output the version number
-j, --json Output in JSON format
-m, --markdown Output in Markdown format
-o, --output <file> Write output to file
-v, --verbose Show ALL findings (not just truncated)
-b, --batch Batch mode for comparing multiple skills
-g, --github Treat shorthand paths as GitHub repos (user/repo/path)
-h, --help Display help for command
๐งช Testing
# Run all tests npm test # Run tests in watch mode npm test # Run tests once npm run test:run # Lint code npm run lint # Build project npm run build
๐ค Contributing
We welcome contributions! Here's how to get started:
Development Setup
git clone https://github.com/joeynyc/skillscore.git cd skillscore npm install npm run build npm link # Run in development mode npm run dev ./test-skill/ # Build for production npm run build
Running Tests
Submitting Changes
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
npm test) - Lint your code (
npm run lint) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Coding Standards
- Use TypeScript for all new code
- Follow existing code style (enforced by ESLint)
- Add tests for new features
- Update documentation for API changes
- Keep commits focused and descriptive
๐ Troubleshooting
Common Issues
Error: "Path does not exist"
- Check for typos in the path
- Ensure you have permission to read the directory
- Verify the path points to a directory, not a file
Error: "No SKILL.md file found"
- Skills must contain a SKILL.md file
- Check if you're pointing to the right directory
- The file must be named exactly "SKILL.md"
Error: "Git is not available"
- Install Git to clone GitHub repositories
- macOS:
xcode-select --install - Ubuntu:
sudo apt-get install git - Windows: Download from git-scm.com
Scores seem too high/low
- Scoring is calibrated against real-world skills
- See the scoring methodology above
- Consider the specific criteria for each category
Getting Help
- ๐ Report Issues
- ๐ฌ Discussions
- ๐ Documentation
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Inspired by the need for quality assessment in AI agent skills
- Built for the OpenClaw and Claude Code communities
- Thanks to all contributors and skill creators
- Scoring methodology informed by software engineering best practices and OpenAI's production skill patterns
๐ Example Scores
Here are some real-world examples of how different skills score:
- Vercel find-skills: 85% (B) - Well-structured, good documentation
- Anthropic frontend-design: 87% (B+) - Excellent clarity, minor dependency issues
- Anthropic skill-creator: 92% (A-) - Outstanding overall, minor safety concerns
Made with โค๏ธ for the AI agent community
Help us improve AI agent skills, one evaluation at a time
