Eval
技能 已验证 活跃Evaluate and rank agent results by metric or LLM judge for an AgentHub session.
To provide a structured and objective way to assess the performance and quality of agent results within an AgentHub session.
功能
- Evaluate agent results by metric
- Evaluate agent results using LLM judge
- Support for hybrid evaluation modes
- Rank agent results for a session
- Update session state after evaluation
使用场景
- Use when comparing multiple agent runs in a session.
- Use to objectively rank agent performance based on predefined metrics.
- Use when qualitative assessment of agent outputs is needed to break ties or provide context.
- Use after an agent session concludes to determine the best performing agent.
非目标
- Running agent sessions themselves.
- Modifying agent configurations or parameters.
- Directly merging or deploying agent results.
安装
请先添加 Marketplace
/plugin marketplace add alirezarezvani/claude-skills/plugin install agenthub@claude-code-skills质量评分
已验证类似扩展
Context Compression
100This skill should be used when the user asks to "compress context", "summarize conversation history", "implement compaction", "reduce token usage", or mentions context compression, structured summarization, tokens-per-task optimization, or long-running agent sessions exceeding context limits.
Horizon Track
100Track long-horizon objectives across multiple sessions with milestone checkpoints, progress persistence, and drift detection
Treat
100修剪臃肿的会话,附带处方。移除进度标记、陈旧读取、重复内容等。
Guard
100保护 Claude Code 会话免受上下文溢出影响,通过运行一个后台守护进程来监控会话大小并在压缩命中之前自动进行修剪。当用户说“guard”、“protect session”、“context getting long”、“prevent compaction”、“session management”或正在运行需要持续上下文保护的代理团队时使用。
Claude Handoff
100运行 /handoff 以捕获会话数据,然后编写一个引用该数据的分阶段实施计划。创建用于跟踪的 beads。
List Topics
100当用户询问当前会话中讨论的主题、想要查看主题列表或询问已讨论过的内容时使用。