Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Constitutional Ai

Skill Verifiziert Aktiv

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Zweck

To enable the training of harmless AI models through AI-generated feedback and self-critique, reducing the need for human-labeled data and improving AI safety alignment.

Funktionen

  • Implements Constitutional AI for AI safety training
  • Details two-phase approach: Supervised Learning (SL) and RLAIF
  • Provides Python code examples for self-critique, revision, and preference evaluation
  • Addresses common issues and offers recovery strategies

Anwendungsfälle

  • Safety alignment of LLMs without human labels
  • Reducing harmful or toxic outputs from AI models
  • Implementing explainable AI decisions through principles
  • Scalable AI safety training using AI feedback

Nicht-Ziele

  • Direct human preference data collection (RLHF)
  • Runtime content filtering (NeMo Guardrails)
  • Pre-trained moderation models (LlamaGuard)

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert
98 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Constitutional Ai

95

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Skill
davila7

Product Self Knowledge

100

Stop and consult this skill whenever your response would include specific facts about Anthropic's products. Covers: Claude Code (how to install, Node.js requirements, platform/OS support, MCP server integration, configuration), Claude API (function calling/tool use, batch processing, SDK usage, rate limits, pricing, models, streaming), and Claude.ai (Pro vs Team vs Enterprise plans, feature limits). Trigger this even for coding tasks that use the Anthropic SDK, content creation mentioning Claude capabilities or pricing, or LLM provider comparisons. Any time you would otherwise rely on memory for Anthropic product details, verify here instead — your training data may be outdated or wrong.

Skill
SeifBenayed

Anthropic Expert

98

Expert on Anthropic Claude API, models, prompt engineering, function calling, vision, and best practices. Triggers on anthropic, claude, api, prompt, function calling, vision, messages api, embeddings

Skill
raintree-technology

Llamaguard

95

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

Skill
Orchestra-Research

Anthropic Sdk

85

Official Anthropic SDK for Claude AI with chat, streaming, function calling, and vision capabilities

Skill
bobmatnyc

LlamaGuard

75

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

Skill
davila7