Eval

Skill Verified Active

Evaluate and rank agent results by metric or LLM judge for an AgentHub session.

Purpose

To provide a structured and objective way to assess the performance and quality of agent results within an AgentHub session.

Features

Evaluate agent results by metric
Evaluate agent results using LLM judge
Support for hybrid evaluation modes
Rank agent results for a session
Update session state after evaluation

Use Cases

Use when comparing multiple agent runs in a session.
Use to objectively rank agent performance based on predefined metrics.
Use when qualitative assessment of agent outputs is needed to break ties or provide context.
Use after an agent session concludes to determine the best performing agent.

Non-Goals

Running agent sessions themselves.
Modifying agent configurations or parameters.
Directly merging or deploying agent results.

Installation

First, add the marketplace

/plugin marketplace add alirezarezvani/claude-skills

/plugin install agenthub@claude-code-skills

Quality Score

Verified

98 /100

Analyzed about 20 hours ago

Trust Signals

Last commitabout 23 hours ago

GitHub owner alirezarezvani

Stars14.6k

LicenseMIT

Websitealirezarezvani.medium.com

Status

View Source

Similar Extensions

Context Compression

100

This skill should be used when the user asks to "compress context", "summarize conversation history", "implement compaction", "reduce token usage", or mentions context compression, structured summarization, tokens-per-task optimization, or long-running agent sessions exceeding context limits.

Skill

muratcankoylan

Horizon Track

100

Track long-horizon objectives across multiple sessions with milestone checkpoints, progress persistence, and drift detection

Skill

ruvnet

Treat

100

Prune bloated session with a prescription. Removes progress ticks, stale reads, duplicate content, and more.

Skill

Ruya-AI

Guard

100

Protect Claude Code sessions from context overflow by running a background daemon that monitors session size and auto-prunes before compaction hits. Use when the user says "guard", "protect session", "context getting long", "prevent compaction", "session management", or is running agent teams that need continuous context protection.

Skill

Ruya-AI

Claude Handoff

100

Run /handoff to capture session data, then write a phased implementation plan that references it. Creates beads for tracking.

Skill

REMvisual

List Topics

100

Use when the user asks about topics discussed in the current session, wants to see a topic list, or asks what has been talked about.

Skill

hatawong