GitHub - joeynyc/skillscore: CLI tool that evaluates AI agent skills and produces quality scores. Works with any SKILL.md-based skill from skills.sh, ClawHub, GitHub, or local directories.

The universal quality standard for AI agent skills.
Evaluate any SKILL.md — from skills.sh, ClawHub, GitHub, or your local machine.

✨ Features

🎯 Comprehensive Evaluation: 8 scoring categories with weighted importance
🎨 Multiple Output Formats: Terminal (colorful), JSON, and Markdown reports
🔍 Deterministic Analysis: Reliable, reproducible scoring without requiring API keys
📋 Detailed Feedback: Specific findings and actionable recommendations
⚡ Fast & Reliable: Built with TypeScript for speed and reliability
🌍 Cross-Platform: Works on Windows, macOS, and Linux
🐙 GitHub Integration: Score skills directly from GitHub repositories
📊 Batch Mode: Compare multiple skills with a summary table
🗣️ Verbose Mode: See all findings, not just truncated summaries

📦 Installation

Global Installation (Recommended)

npm install -g skillscore

Local Installation

npm install skillscore
npx skillscore ./my-skill/

From Source

git clone https://github.com/joeynyc/skillscore.git
cd skillscore
npm install
npm run build
npm link

🚀 Quick Start

Evaluate a skill directory:

📖 Usage Examples

Basic Usage

# Evaluate a skill
skillscore ./skills/my-skill/

# Evaluate with verbose output (shows all findings)
skillscore ./skills/my-skill/ --verbose

GitHub Integration

# Full GitHub URL (always recognized)
skillscore https://github.com/vercel-labs/skills/tree/main/skills/find-skills

# GitHub shorthand (requires -g/--github flag)
skillscore -g vercel-labs/skills/find-skills

# Anthropic skills
skillscore -g anthropic/skills/skill-creator

Output Formats

# JSON output
skillscore ./skills/my-skill/ --json

# Markdown report
skillscore ./skills/my-skill/ --markdown

# Save to file
skillscore ./skills/my-skill/ --output report.md
skillscore ./skills/my-skill/ --json --output score.json

Batch Mode

# Compare multiple skills (auto-enters batch mode)
skillscore ./skill1 ./skill2 ./skill3

# Explicit batch mode flag
skillscore ./skill1 ./skill2 --batch

# Compare GitHub skills
skillscore -g user/repo1/skill1 user/repo2/skill2 --json

Utility Commands

# Show version
skillscore --version

# Get help
skillscore --help

📊 Example Output

Terminal Output

📊 SKILLSCORE EVALUATION REPORT
============================================================

📋 Skill: Weather Information Fetcher
   Fetches current weather data for any city using OpenWeatherMap API
   Path: ./weather-skill

🎯 OVERALL SCORE
   A- - 92.0% (9.2/10.0 points)

📝 CATEGORY BREAKDOWN
------------------------------------------------------------
Structure ████████████████████ 100.0%
   SKILL.md exists, clear name/description, follows conventions
   Score: 10/10 (weight: 15%)
   ✓ SKILL.md file exists (+3)
   ✓ Clear skill name: "Weather Information Fetcher" (+2)
   ✓ Clear description provided (+2)
   ... 2 more findings

Clarity ██████████████████░░ 90.0%
   Specific actionable instructions, no ambiguity, logical order
   Score: 9/10 (weight: 20%)
   ✓ Contains specific step-by-step instructions with commands (+3)
   ✓ No ambiguous language detected (+3)
   ✓ Instructions follow logical order (+2)
   ... 1 more finding (use --verbose to see all)

Safety ██████████████░░░░░░ 70.0%
   No destructive commands, respects permissions
   Score: 7/10 (weight: 20%)
   ✓ No dangerous destructive commands found (+3)
   ✓ No obvious secret exfiltration risks (+3)
   ✗ Some potential security concerns detected

📈 SUMMARY
------------------------------------------------------------
✅ Strengths: Structure, Clarity, Dependencies, Documentation
❌ Areas for improvement: Safety

Generated: 2/11/2026, 3:15:49 PM

Batch Mode Output

📊 BATCH SKILL EVALUATION
Evaluating 3 skill(s)...

[1/3] Processing: ./weather-skill
✅ Completed

[2/3] Processing: ./file-backup
✅ Completed

[3/3] Processing: user/repo/skill
✅ Completed

📋 COMPARISON SUMMARY

Skill                          Grade  Score    Structure Clarity Safety Status    
Weather Information Fetcher    A-     92.0%    100%      90%     70%    OK        
File Backup Tool              B+     87.0%    95%       85%     90%    OK        
Advanced Data Processor       A      94.0%    100%      95%     85%    OK        

📈 BATCH SUMMARY
✅ Successful: 3
📊 Average Score: 91.0%

🏆 Scoring System

SkillScore evaluates skills across 8 weighted categories:

Category	Weight	Description
Structure	15%	SKILL.md exists, clear name/description, file organization, artifact output spec
Clarity	20%	Specific actionable instructions, no ambiguity, logical order
Safety	20%	No destructive commands, respects permissions, network containment
Dependencies	10%	Lists required tools/APIs, install instructions, env vars
Error Handling	10%	Failure instructions, fallbacks, no silent failures
Scope	10%	Single responsibility, routing quality, negative examples
Documentation	10%	Usage examples, embedded templates, expected I/O
Portability	5%	Cross-platform, no hardcoded paths, relative paths

Scoring Methodology

Each category is scored from 0-10 points based on specific criteria:

Structure: Checks for SKILL.md existence, clear naming, proper organization, and whether outputs/artifacts are defined
Clarity: Analyzes instruction specificity, ambiguity, logical flow
Safety: Scans for destructive commands, security risks, permission issues, and network containment (does the skill scope network access when using HTTP/APIs?)
Dependencies: Validates tool listings, installation instructions, environment setup
Error Handling: Reviews error scenarios, fallback strategies, validation
Scope: Assesses single responsibility, trigger clarity, conflict potential, negative routing examples ("don't use when..."), and routing quality (concrete signals vs vague descriptions)
Documentation: Evaluates examples, I/O documentation, troubleshooting guides, and embedded templates/worked examples with expected output
Portability: Checks cross-platform compatibility, path handling, limitations

v1.1.0: Production-Validated Checks

Five new sub-criteria added in v1.1.0, inspired by OpenAI's Skills + Shell + Compaction blog and production data from Glean:

Check	Category	Points	Why It Matters
Negative routing examples	Scope	2	Skills that say when NOT to use them trigger ~20% more accurately (Glean data)
Routing quality	Scope	1	Descriptions with concrete tool names, I/O, and "use when" patterns route better than marketing copy
Embedded templates	Documentation	2	Real output templates inside the skill drove the biggest quality + latency gains in production
Network containment	Safety	1	Skills combining tools + open network access are a data exfiltration risk without scoping
Artifact output spec	Structure	1	Skills that define where outputs go create clean review boundaries

Grade Scale

Grade	Score Range	Description
A+	97-100%	Exceptional quality
A	93-96%	Excellent
A-	90-92%	Very good
B+	87-89%	Good
B	83-86%	Above average
B-	80-82%	Satisfactory
C+	77-79%	Acceptable
C	73-76%	Fair
C-	70-72%	Needs improvement
D+	67-69%	Poor
D	65-66%	Very poor
D-	60-64%	Failing
F	0-59%	Unacceptable

📁 What Makes a Good Skill?

Required Structure

my-skill/
├── SKILL.md           # Main skill definition (REQUIRED)
├── README.md          # Documentation (recommended)
├── package.json       # Dependencies (if applicable)
├── scripts/           # Executable scripts
│   ├── setup.sh
│   └── main.py
└── examples/          # Usage examples
    └── example.md

SKILL.md Template

# My Awesome Skill

Brief description of what this skill does and when to use it.

## When to Use

Use this skill when you need to [specific task] with [specific tools/inputs].

## When NOT to Use

Don't use this skill when:
- The task is [alternative scenario] — use [other skill] instead
- You need [different capability]

## Dependencies

- Tool 1: Installation instructions
- API Key: How to obtain and configure
- Environment: OS requirements

## Usage

1. Step-by-step instructions
2. Specific commands to run
3. Expected outputs

## Output

Results are written to `./output/` as JSON files.

## Error Handling

- Common issues and solutions
- Fallback strategies
- Validation steps

## Examples

### Example Output

```json
{
  "status": "success",
  "result": "Example of what the skill produces"
}

# Working example
./scripts/main.py --input "test data"

Limitations

Known constraints
Platform-specific notes
Edge cases


## 🔧 API Usage

Use SkillScore programmatically in your Node.js projects:

```typescript
import { SkillParser, SkillScorer, TerminalReporter } from 'skillscore';
import type { Reporter, SkillScore } from 'skillscore';

const parser = new SkillParser();
const scorer = new SkillScorer();
const reporter: Reporter = new TerminalReporter();

async function evaluateSkill(skillPath: string): Promise<SkillScore> {
  const skill = await parser.parseSkill(skillPath);
  const score = await scorer.scoreSkill(skill);
  const report = reporter.generateReport(score);

  console.log(report);
  return score;
}

All three reporters (TerminalReporter, JsonReporter, MarkdownReporter) implement the Reporter interface.

🛠️ CLI Options

Usage: skillscore [options] <path...>

Arguments:
  path                   Path(s) to skill directory, GitHub URL, or shorthand

Options:
  -V, --version         Output the version number
  -j, --json            Output in JSON format
  -m, --markdown        Output in Markdown format
  -o, --output <file>   Write output to file
  -v, --verbose         Show ALL findings (not just truncated)
  -b, --batch           Batch mode for comparing multiple skills
  -g, --github          Treat shorthand paths as GitHub repos (user/repo/path)
  -h, --help           Display help for command

🧪 Testing

# Run all tests
npm test

# Run tests in watch mode
npm test

# Run tests once
npm run test:run

# Lint code
npm run lint

# Build project
npm run build

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

git clone https://github.com/joeynyc/skillscore.git
cd skillscore
npm install
npm run build
npm link

# Run in development mode
npm run dev ./test-skill/

# Build for production
npm run build

Running Tests

Submitting Changes

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests for new functionality
Ensure all tests pass (npm test)
Lint your code (npm run lint)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Coding Standards

Use TypeScript for all new code
Follow existing code style (enforced by ESLint)
Add tests for new features
Update documentation for API changes
Keep commits focused and descriptive

🐛 Troubleshooting

Common Issues

Error: "Path does not exist"

Check for typos in the path
Ensure you have permission to read the directory
Verify the path points to a directory, not a file

Error: "No SKILL.md file found"

Skills must contain a SKILL.md file
Check if you're pointing to the right directory
The file must be named exactly "SKILL.md"

Error: "Git is not available"

Install Git to clone GitHub repositories
macOS: xcode-select --install
Ubuntu: sudo apt-get install git
Windows: Download from git-scm.com

Scores seem too high/low

Scoring is calibrated against real-world skills
See the scoring methodology above
Consider the specific criteria for each category

Getting Help

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by the need for quality assessment in AI agent skills
Built for the OpenClaw and Claude Code communities
Thanks to all contributors and skill creators
Scoring methodology informed by software engineering best practices and OpenAI's production skill patterns

📊 Example Scores

Here are some real-world examples of how different skills score:

Vercel find-skills: 85% (B) - Well-structured, good documentation
Anthropic frontend-design: 87% (B+) - Excellent clarity, minor dependency issues
Anthropic skill-creator: 92% (A-) - Outstanding overall, minor safety concerns

Made with ❤️ for the AI agent community
Help us improve AI agent skills, one evaluation at a time