Advanced Evaluation
Skill Verifiziert AktivThis skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise comparison, position bias, evaluation pipelines, or automated quality assessment.
To empower users to build robust and unbiased LLM evaluation systems by mastering advanced LLM-as-a-Judge techniques and implementing production-grade patterns.
Funktionen
- Implement LLM-as-judge evaluation pipelines
- Perform pairwise comparison with position bias mitigation
- Generate domain-specific scoring rubrics
- Mitigate systematic biases in LLM evaluations
- Select appropriate metrics and evaluation strategies
Anwendungsfälle
- Building automated evaluation systems for LLM outputs
- Comparing multiple model responses to select the best one
- Establishing consistent quality standards across evaluation teams
- Designing A/B tests for prompt or model changes
Nicht-Ziele
- Performing actual LLM generation
- Evaluating non-textual outputs
- Providing a generic prompt engineering skill
Trust
- info:Issues AttentionIn the last 90 days, 6 issues were opened and 2 were closed, indicating slow but present maintainer engagement. The closure rate is low (33%), but the number of open issues is relatively small.
Installation
Zuerst Marketplace hinzufügen
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering/plugin install Agent-Skills-for-Context-Engineering@context-engineering-marketplaceQualitätspunktzahl
VerifiziertVertrauenssignale
Ähnliche Erweiterungen
Evaluation
98This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines.
Context Compression
100This skill should be used when the user asks to "compress context", "summarize conversation history", "implement compaction", "reduce token usage", or mentions context compression, structured summarization, tokens-per-task optimization, or long-running agent sessions exceeding context limits.
LinkedIn Humanizer
100Schrubbt KI-Anzeichen aus jedem Textentwurf ODER prüft einen fertigen Beitrag anhand der Checkliste für heuristische Algorithmen von 2026. Umschreiber auf mehreren Ebenen (forensisch / streng / ästhetisch / alle) plus `--mode audit` für eine reine Erkennungsprüfung mit Bestehen/Nichtbestehen-Bewertung, die Länge, Aufhänger, Handlungsaufforderung, Formatstrafen und KI-Vokabular abdeckt. Unterwerkzeuge: Emoji-Mustererkennung, Tester für die Verteilung mehrerer Detektoren (GPTZero, Originality.ai, ZeroGPT, Sapling, Copyleaks), Regelerklärer. Löst bei "humanisieren", "de-KI", "diesen Entwurf prüfen", "vor dem Posten prüfen", "ist das fertig" aus.
Convert Resume to Markdown
100Convert a resume PDF to clean markdown for LLM parsing or candidate pipelines.
Sentiment Analyzer
100Analyze sentiment in text using ML models. Use when: analyzing customer reviews; processing NPS feedback; monitoring brand mentions; evaluating campaign responses; categorizing support tickets
LangSmith Observability
99LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.