Arize Evaluator
Skill Verifiziert AktivHandles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.
To enable users to efficiently set up and manage LLM-as-judge evaluation workflows on Arize, ensuring consistent and reliable assessment of model performance.
Funktionen
- Create/update LLM-as-judge evaluators
- Run evaluations on project spans or experiment runs
- Manage AI integrations and model configurations
- Configure column mappings and data granularity
- Automate continuous monitoring and backfilling
Anwendungsfälle
- When needing to evaluate LLM responses for hallucination, correctness, or relevance.
- To set up automated quality checks for LLM outputs on new data.
- When analyzing the performance of different LLM experiments against a dataset.
- To configure continuous monitoring of LLM-as-judge evaluations for production systems.
Nicht-Ziele
- Directly interacting with LLM APIs without using the 'ax' CLI and Arize platform.
- Performing data analysis or visualization outside of the Arize platform.
- Managing Arize project or dataset creation (delegated to other skills).
Workflow
- Confirm project/experiment details and AI integration.
- Create or select an LLM evaluator with specific templates and choices.
- Determine column mappings from actual data.
- Create a task (continuous, backfill, or both).
- Trigger a backfill run if requested, then monitor.
Praktiken
- LLM Evaluation
- MLOps
- Data Science Workflows
- CLI Tooling
Voraussetzungen
- ax CLI installed (v0.14.0+)
- Configured Arize profile with API key
- Arize space name or ID (via ARIZE_SPACE env var)
- AI integration configured in Arize (for LLM provider credentials)
Installation
npx skills add github/awesome-copilotFührt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.
Qualitätspunktzahl
VerifiziertVertrauenssignale
Ähnliche Erweiterungen
Arize Experiment
100Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.
Arize Dataset
100Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.
ML Pipeline Workflow
98Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.
Arize Prompt Optimization
100Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.
Arize Ai Provider Integration
100Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize.
Trader Regime
100Detect current market regime using npx neural-trader — bull/bear/ranging/volatile classification with recommended strategy