Arize Experiment
技能 已验证 活跃Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.
To streamline the process of evaluating and comparing model performance by managing Arize experiments, from creation to detailed analysis.
功能
- Create, run, and analyze Arize experiments
- Export experiment runs to various formats (JSON, CSV, Parquet)
- Compare results between different experiments
- Manage datasets and experiment configurations
- Utilize the `ax` CLI for all operations
使用场景
- Use when you need to create a new experiment to track model performance against a dataset.
- Use when you need to compare the results of two or more experiments to identify regressions or improvements.
- Use when you need to export experiment data for further analysis or visualization.
- Use when you need to benchmark different models or prompt strategies using A/B testing.
非目标
- Setting up or managing the Arize platform itself (focus is on experiments).
- Directly calling the Arize REST API (relies on `ax` CLI abstraction).
- Fabricating or simulating model outputs or evaluation scores.
安装
npx skills add github/awesome-copilot通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。
质量评分
已验证类似扩展
Arize Evaluator
100Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.
Arize Dataset
100Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.
CE Optimize
100Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.
Arize Prompt Optimization
100Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.
Arize Ai Provider Integration
100Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize.
Trader Regime
100Detect current market regime using npx neural-trader — bull/bear/ranging/volatile classification with recommended strategy