此内容尚未提供您的语言版本,正在以英文显示。

Constitutional Ai

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

目的

To enable the training of harmless AI models through AI-generated feedback and self-critique, reducing the need for human-labeled data and improving AI safety alignment.

功能

Implements Constitutional AI for AI safety training
Details two-phase approach: Supervised Learning (SL) and RLAIF
Provides Python code examples for self-critique, revision, and preference evaluation
Addresses common issues and offers recovery strategies

使用场景

Safety alignment of LLMs without human labels
Reducing harmful or toxic outputs from AI models
Implementing explainable AI decisions through principles
Scalable AI safety training using AI feedback

非目标

Direct human preference data collection (RLHF)
Runtime content filtering (NeMo Guardrails)
Pre-trained moderation models (LlamaGuard)

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

98 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

Constitutional Ai

技能

davila7

Product Self Knowledge

100

Stop and consult this skill whenever your response would include specific facts about Anthropic's products. Covers: Claude Code (how to install, Node.js requirements, platform/OS support, MCP server integration, configuration), Claude API (function calling/tool use, batch processing, SDK usage, rate limits, pricing, models, streaming), and Claude.ai (Pro vs Team vs Enterprise plans, feature limits). Trigger this even for coding tasks that use the Anthropic SDK, content creation mentioning Claude capabilities or pricing, or LLM provider comparisons. Any time you would otherwise rely on memory for Anthropic product details, verify here instead — your training data may be outdated or wrong.

技能

SeifBenayed

Anthropic Expert

Expert on Anthropic Claude API, models, prompt engineering, function calling, vision, and best practices. Triggers on anthropic, claude, api, prompt, function calling, vision, messages api, embeddings

技能

raintree-technology

Llamaguard

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

技能

Orchestra-Research

Anthropic Sdk

Official Anthropic SDK for Claude AI with chat, streaming, function calling, and vision capabilities

技能

bobmatnyc

LlamaGuard

技能

davila7