Plugin Eval

Plugin Verified Active

Three-layer quality evaluation framework for Claude Code plugins with Elo ranking

1 Skill 0 MCPs

Purpose

To provide developers and platform curators with a robust, automated system for evaluating and improving the quality of Claude Code extensions.

Features

Three-layer evaluation framework (static, LLM judge, Monte Carlo)
Elo ranking for comparative quality assessment
CLI commands for scoring, certifying, and comparing extensions
Detailed documentation and rubrics for evaluation dimensions

Use Cases

Evaluating the quality of a new or existing Claude Code skill.
Certifying a plugin for marketplace inclusion or advanced use.
Comparing two different implementations of a similar capability.
Understanding the methodology behind Claude Code extension quality scoring.

Non-Goals

Executing or running Claude Code extensions directly.
Providing a marketplace for distributing extensions.
Automated fixing of detected quality issues.

Installation

First, add the marketplace

/plugin marketplace add wshobson/agents

/plugin install plugin-eval@claude-code-workflows

Contains 1 extensions

Skill (1)

Evaluation Methodology Skill

PluginEval quality methodology — dimensions, rubrics, statistical methods, and scoring formulas. Use this skill when understanding how plugin quality is measured, when interpreting a low score on a specific dimension, when deciding how to improve a skill's triggering accuracy or orchestration fitness, when calibrating scoring thresholds for your marketplace, or when explaining quality badges to external partners like Neon.

Quality Score

Verified

98 /100

Analyzed 13 days ago

Trust Signals

Last commit15 days ago

GitHub owner wshobson

Stars35.3k

LicenseMIT

Websitesethhobson.com

Status

View Source

Similar Extensions

Cypress

100

Create, update, and fix Cypress tests. Connect to Cypress Cloud to see test results and use data to manage your test suite.

Plugin

cypress-io

Huggingface Community Evals

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom evaluations with vLLM/lighteval.

Plugin

huggingface

Voltagent Qa Sec

Testing, security, and code quality experts - code review, penetration testing, QA automation, and UI flow validation

Plugin

VoltAgent