跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Arize Annotation

技能 已验证 活跃

Creates and manages annotation configs (categorical, continuous, freeform label schemas) and annotation queues (human review workflows) on Arize. Applies human annotations to project spans via the Python SDK. Use when the user mentions annotation config, annotation queue, label schema, human feedback, bulk annotate spans, update_annotations, labeling queue, annotate record, or human review.

目的

To streamline the process of managing data annotation workflows and configurations within the Arize platform.

功能

  • Manage annotation configurations (categorical, continuous, freeform)
  • Create and manage annotation queues for human review
  • Apply human annotations to project spans via Python SDK
  • Bulk update annotations for dataset examples and experiment records

使用场景

  • When setting up a new data labeling schema in Arize.
  • When needing to route data for human review via an annotation queue.
  • When applying bulk human annotations to existing project spans.
  • When managing label schema types and values for machine learning projects.

非目标

  • Performing automated quality checks on annotations (use arize-evaluator).
  • Managing Arize datasets or experiments directly (use arize-dataset/arize-experiment).
  • Interacting with Arize beyond annotation and labeling workflows.

安装

npx skills add github/awesome-copilot

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
96 /100
1 day ago 分析

信任信号

最近提交2 days ago
星标32.9k
许可证MIT
状态
查看源代码

类似扩展

Label Training Data

98

Set up systematic data labeling workflows using Label Studio or similar tools. Implement quality controls, measure inter-annotator agreement, manage labeler teams, and integrate labeled data into ML training pipelines. Use when starting a supervised ML project that requires labeled training data, when model performance is limited by insufficient labeled examples, when labeling text, images, audio, or video, or when implementing active learning to prioritize the most valuable examples.

技能
pjt222

Arize Experiment

100

Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.

技能
github

Arize Evaluator

100

Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.

技能
github

Arize Dataset

100

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.

技能
github

Annotate Source Files

100

Add PUT workflow annotations to source files using the correct language-specific comment prefix. Covers annotation syntax, skeleton generation via put_generate(), multiline annotations, .internal variables, and validation. Supports 30+ languages with automatic comment prefix detection. Use after analyzing a codebase and having an annotation plan, when adding workflow documentation to new or existing source files, or when documenting data pipelines, ETL processes, or multi-step computations.

技能
pjt222

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

技能
github