Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Constitutional Ai

Skill Verifiziert Aktiv

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Zweck

To provide a clear understanding and practical implementation guide for Constitutional AI, enabling users to train AI models for safety alignment and reduce harmful outputs.

Funktionen

Explains Constitutional AI methodology
Details two-phase training approach (SL and RLAIF)
Provides Python code examples for each phase
Addresses common issues and offers solutions
Outlines hardware and compute requirements

Anwendungsfälle

Training AI models for safety alignment
Reducing harmful outputs in AI systems
Implementing explainable AI decisions
Scalable AI safety training without human labels

Nicht-Ziele

Directly performing RLHF training
Providing a pre-trained moderation model like LlamaGuard
Runtime content filtering solutions like NeMo Guardrails

Workflow

Generate initial responses using a base model.
Critique responses against a constitution.
Revise responses based on critiques.
Fine-tune the model on revised responses (SL phase).
Generate comparison pairs of responses.
Evaluate AI preferences based on the constitution.
Train a preference model (reward model).
Perform RL training using RLAIF (RL phase).

Praktiken

Safety Alignment
AI Training
Reinforcement Learning
Self-Critique
AI Feedback

Voraussetzungen

Python 3.7+
NVIDIA GPU (A100/H100 recommended)
transformers, torch, trl libraries
Sufficient VRAM (e.g., 40GB for 7B models)

Trust

info:Issues AttentionopenIssues90d is 17 and closedIssues90d is 4. The closure rate is below 50%, indicating slower responsiveness to new issues.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert

95 /100

Analysiert about 19 hours ago

Vertrauenssignale

Letzter Commitabout 21 hours ago

GitHub-Inhaber davila7

Sterne27.2k

Downloads 23k

LizenzMIT

Websiteaitmpl.com

Status

Quellcode ansehen

Constitutional Ai

Funktionen

Anwendungsfälle

Nicht-Ziele

Workflow

Praktiken

Voraussetzungen

Trust

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Constitutional Ai

Llamaguard

LlamaGuard

Fixflow

Prompt Guard

Gws Modelarmor Sanitize Prompt