跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Arize Evaluator

技能 已验证 活跃

Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.

目的

To enable users to efficiently set up and manage LLM-as-judge evaluation workflows on Arize, ensuring consistent and reliable assessment of model performance.

功能

  • Create/update LLM-as-judge evaluators
  • Run evaluations on project spans or experiment runs
  • Manage AI integrations and model configurations
  • Configure column mappings and data granularity
  • Automate continuous monitoring and backfilling

使用场景

  • When needing to evaluate LLM responses for hallucination, correctness, or relevance.
  • To set up automated quality checks for LLM outputs on new data.
  • When analyzing the performance of different LLM experiments against a dataset.
  • To configure continuous monitoring of LLM-as-judge evaluations for production systems.

非目标

  • Directly interacting with LLM APIs without using the 'ax' CLI and Arize platform.
  • Performing data analysis or visualization outside of the Arize platform.
  • Managing Arize project or dataset creation (delegated to other skills).

工作流

  1. Confirm project/experiment details and AI integration.
  2. Create or select an LLM evaluator with specific templates and choices.
  3. Determine column mappings from actual data.
  4. Create a task (continuous, backfill, or both).
  5. Trigger a backfill run if requested, then monitor.

实践

  • LLM Evaluation
  • MLOps
  • Data Science Workflows
  • CLI Tooling

先决条件

  • ax CLI installed (v0.14.0+)
  • Configured Arize profile with API key
  • Arize space name or ID (via ARIZE_SPACE env var)
  • AI integration configured in Arize (for LLM provider credentials)

安装

npx skills add github/awesome-copilot

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
100 /100
1 day ago 分析

信任信号

最近提交1 day ago
星标32.9k
许可证MIT
状态
查看源代码

类似扩展

Arize Experiment

100

Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.

技能
github

Arize Dataset

100

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.

技能
github

ML Pipeline Workflow

98

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.

技能
wshobson

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

技能
github

Arize Ai Provider Integration

100

Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize.

技能
github

Trader Regime

100

Detect current market regime using npx neural-trader — bull/bear/ranging/volatile classification with recommended strategy

技能
ruvnet