Skip to main content

Eval

Skill Verified Active
Part of:Agenthub

Evaluate and rank agent results by metric or LLM judge for an AgentHub session.

Purpose

To provide a structured and objective way to assess the performance and quality of agent results within an AgentHub session.

Features

  • Evaluate agent results by metric
  • Evaluate agent results using LLM judge
  • Support for hybrid evaluation modes
  • Rank agent results for a session
  • Update session state after evaluation

Use Cases

  • Use when comparing multiple agent runs in a session.
  • Use to objectively rank agent performance based on predefined metrics.
  • Use when qualitative assessment of agent outputs is needed to break ties or provide context.
  • Use after an agent session concludes to determine the best performing agent.

Non-Goals

  • Running agent sessions themselves.
  • Modifying agent configurations or parameters.
  • Directly merging or deploying agent results.

Installation

First, add the marketplace

/plugin marketplace add alirezarezvani/claude-skills
/plugin install agenthub@claude-code-skills

Quality Score

Verified
98 /100
Analyzed about 20 hours ago

Trust Signals

Last commitabout 23 hours ago
Stars14.6k
LicenseMIT
Status
View Source

© 2025 SkillRepo · Find the right skill, skip the noise.