Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Arize Evaluator

Skill Verifiziert Aktiv

Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.

Zweck

To enable users to efficiently set up and manage LLM-as-judge evaluation workflows on Arize, ensuring consistent and reliable assessment of model performance.

Funktionen

  • Create/update LLM-as-judge evaluators
  • Run evaluations on project spans or experiment runs
  • Manage AI integrations and model configurations
  • Configure column mappings and data granularity
  • Automate continuous monitoring and backfilling

Anwendungsfälle

  • When needing to evaluate LLM responses for hallucination, correctness, or relevance.
  • To set up automated quality checks for LLM outputs on new data.
  • When analyzing the performance of different LLM experiments against a dataset.
  • To configure continuous monitoring of LLM-as-judge evaluations for production systems.

Nicht-Ziele

  • Directly interacting with LLM APIs without using the 'ax' CLI and Arize platform.
  • Performing data analysis or visualization outside of the Arize platform.
  • Managing Arize project or dataset creation (delegated to other skills).

Workflow

  1. Confirm project/experiment details and AI integration.
  2. Create or select an LLM evaluator with specific templates and choices.
  3. Determine column mappings from actual data.
  4. Create a task (continuous, backfill, or both).
  5. Trigger a backfill run if requested, then monitor.

Praktiken

  • LLM Evaluation
  • MLOps
  • Data Science Workflows
  • CLI Tooling

Voraussetzungen

  • ax CLI installed (v0.14.0+)
  • Configured Arize profile with API key
  • Arize space name or ID (via ARIZE_SPACE env var)
  • AI integration configured in Arize (for LLM provider credentials)

Installation

npx skills add github/awesome-copilot

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert
100 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne32.9k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Arize Experiment

100

Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.

Skill
github

Arize Dataset

100

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.

Skill
github

ML Pipeline Workflow

98

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.

Skill
wshobson

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill
github

Arize Ai Provider Integration

100

Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize.

Skill
github

Trader Regime

100

Detect current market regime using npx neural-trader — bull/bear/ranging/volatile classification with recommended strategy

Skill
ruvnet