此内容尚未提供您的语言版本,正在以英文显示。

Arize Annotation

技能已验证活跃

Creates and manages annotation configs (categorical, continuous, freeform label schemas) and annotation queues (human review workflows) on Arize. Applies human annotations to project spans via the Python SDK. Use when the user mentions annotation config, annotation queue, label schema, human feedback, bulk annotate spans, update_annotations, labeling queue, annotate record, or human review.

目的

To streamline the process of managing data annotation workflows and configurations within the Arize platform.

功能

Manage annotation configurations (categorical, continuous, freeform)
Create and manage annotation queues for human review
Apply human annotations to project spans via Python SDK
Bulk update annotations for dataset examples and experiment records

使用场景

When setting up a new data labeling schema in Arize.
When needing to route data for human review via an annotation queue.
When applying bulk human annotations to existing project spans.
When managing label schema types and values for machine learning projects.

非目标

Performing automated quality checks on annotations (use arize-evaluator).
Managing Arize datasets or experiments directly (use arize-dataset/arize-experiment).
Interacting with Arize beyond annotation and labeling workflows.

安装

npx skills add github/awesome-copilot

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

96 /100

1 day ago 分析

信任信号

最近提交2 days ago

GitHub 所有者 github

星标32.9k

许可证MIT

网站awesome-copilot.github.com

状态

查看源代码

类似扩展

Label Training Data

Set up systematic data labeling workflows using Label Studio or similar tools. Implement quality controls, measure inter-annotator agreement, manage labeler teams, and integrate labeled data into ML training pipelines. Use when starting a supervised ML project that requires labeled training data, when model performance is limited by insufficient labeled examples, when labeling text, images, audio, or video, or when implementing active learning to prioritize the most valuable examples.

技能

pjt222

Arize Experiment

100

Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.

技能

github

Arize Evaluator

100

Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.

技能

github

Arize Dataset

100

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.

技能

github

Annotate Source Files

100

Add PUT workflow annotations to source files using the correct language-specific comment prefix. Covers annotation syntax, skeleton generation via put_generate(), multiline annotations, .internal variables, and validation. Supports 30+ languages with automatic comment prefix detection. Use after analyzing a codebase and having an annotation plan, when adding workflow documentation to new or existing source files, or when documenting data pipelines, ETL processes, or multi-step computations.

技能

pjt222

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

技能

github