跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Eval

技能 已验证 活跃
属于:Agenthub

Evaluate and rank agent results by metric or LLM judge for an AgentHub session.

目的

To provide a structured and objective way to assess the performance and quality of agent results within an AgentHub session.

功能

  • Evaluate agent results by metric
  • Evaluate agent results using LLM judge
  • Support for hybrid evaluation modes
  • Rank agent results for a session
  • Update session state after evaluation

使用场景

  • Use when comparing multiple agent runs in a session.
  • Use to objectively rank agent performance based on predefined metrics.
  • Use when qualitative assessment of agent outputs is needed to break ties or provide context.
  • Use after an agent session concludes to determine the best performing agent.

非目标

  • Running agent sessions themselves.
  • Modifying agent configurations or parameters.
  • Directly merging or deploying agent results.

安装

请先添加 Marketplace

/plugin marketplace add alirezarezvani/claude-skills
/plugin install agenthub@claude-code-skills

质量评分

已验证
98 /100
1 day ago 分析

信任信号

最近提交1 day ago
星标14.6k
许可证MIT
状态
查看源代码