此内容尚未提供您的语言版本,正在以英文显示。

Agent Evaluation

技能已验证活跃

Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.

目的

To empower users with systematic methods and best practices for evaluating and enhancing the performance, reliability, and quality of AI agents and their components.

功能

Structured evaluation methodologies (LLM-as-Judge, Human Eval)
Comprehensive rubric design with scoring guidelines
Techniques for mitigating LLM evaluation biases
Practical prompt patterns and workflow examples
Guidance on test case design and iteration

使用场景

Testing prompt effectiveness for AI agents
Validating context engineering choices
Measuring improvement quality of AI outputs
Developing robust evaluation pipelines for AI systems

非目标

Developing AI agents themselves
Automating all aspects of AI evaluation without human oversight
Providing domain-specific evaluation rubrics outside of general AI agent assessment

实践

Evaluation methodology
Prompt engineering
Test design
Bias mitigation

Versioning

info:Release ManagementWhile the trust signals indicate a recent commit date, there is no explicit versioning declared in the manifest or CHANGELOG, and installation instructions reference 'main'.

安装

请先添加 Marketplace

/plugin marketplace add NeoLabHQ/context-engineering-kit

/plugin install customaize-agent@context-engineering-kit

质量评分

已验证

99 /100

1 day ago 分析

信任信号

最近提交9 days ago

GitHub 所有者 NeoLabHQ

星标993

许可证GPL-3.0

网站cek.neolab.finance

状态

查看源代码

类似扩展

Create Command

100

Interactive assistant for creating new Claude commands with proper structure, patterns, and MCP tool integration

技能

NeoLabHQ

Project Development

100

This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.

技能

muratcankoylan

Write A Skill

100

Create new agent skills with proper structure, progressive disclosure, and bundled resources. Use when user wants to create, write, or build a new skill.

技能

mattpocock

Context Compression

100

This skill should be used when the user asks to "compress context", "summarize conversation history", "implement compaction", "reduce token usage", or mentions context compression, structured summarization, tokens-per-task optimization, or long-running agent sessions exceeding context limits.

技能

muratcankoylan

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

技能

github

Prompt Optimization

100

应用提示重复以提高非推理 LLM 的准确性

技能

asklokesh