此内容尚未提供您的语言版本,正在以英文显示。

Arize Evaluator

技能已验证活跃

Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.

目的

To enable users to efficiently set up and manage LLM-as-judge evaluation workflows on Arize, ensuring consistent and reliable assessment of model performance.

功能

Create/update LLM-as-judge evaluators
Run evaluations on project spans or experiment runs
Manage AI integrations and model configurations
Configure column mappings and data granularity
Automate continuous monitoring and backfilling

使用场景

When needing to evaluate LLM responses for hallucination, correctness, or relevance.
To set up automated quality checks for LLM outputs on new data.
When analyzing the performance of different LLM experiments against a dataset.
To configure continuous monitoring of LLM-as-judge evaluations for production systems.

非目标

Directly interacting with LLM APIs without using the 'ax' CLI and Arize platform.
Performing data analysis or visualization outside of the Arize platform.
Managing Arize project or dataset creation (delegated to other skills).

工作流

Confirm project/experiment details and AI integration.
Create or select an LLM evaluator with specific templates and choices.
Determine column mappings from actual data.
Create a task (continuous, backfill, or both).
Trigger a backfill run if requested, then monitor.

实践

LLM Evaluation
MLOps
Data Science Workflows
CLI Tooling

先决条件

ax CLI installed (v0.14.0+)
Configured Arize profile with API key
Arize space name or ID (via ARIZE_SPACE env var)
AI integration configured in Arize (for LLM provider credentials)

安装

npx skills add github/awesome-copilot

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

100 /100

1 day ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 github

星标32.9k

许可证MIT

网站awesome-copilot.github.com

状态

查看源代码

类似扩展

Arize Experiment

Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.

Arize Dataset

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.

ML Pipeline Workflow

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.

Arize Prompt Optimization

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Arize Ai Provider Integration

Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize.

Trader Regime

Detect current market regime using npx neural-trader — bull/bear/ranging/volatile classification with recommended strategy