此内容尚未提供您的语言版本,正在以英文显示。

Constitutional Ai

技能已验证活跃

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

目的

To provide a clear understanding and practical implementation guide for Constitutional AI, enabling users to train AI models for safety alignment and reduce harmful outputs.

功能

Explains Constitutional AI methodology
Details two-phase training approach (SL and RLAIF)
Provides Python code examples for each phase
Addresses common issues and offers solutions
Outlines hardware and compute requirements

使用场景

Training AI models for safety alignment
Reducing harmful outputs in AI systems
Implementing explainable AI decisions
Scalable AI safety training without human labels

非目标

Directly performing RLHF training
Providing a pre-trained moderation model like LlamaGuard
Runtime content filtering solutions like NeMo Guardrails

工作流

Generate initial responses using a base model.
Critique responses against a constitution.
Revise responses based on critiques.
Fine-tune the model on revised responses (SL phase).
Generate comparison pairs of responses.
Evaluate AI preferences based on the constitution.
Train a preference model (reward model).
Perform RL training using RLAIF (RL phase).

实践

Safety Alignment
AI Training
Reinforcement Learning
Self-Critique
AI Feedback

先决条件

Python 3.7+
NVIDIA GPU (A100/H100 recommended)
transformers, torch, trl libraries
Sufficient VRAM (e.g., 40GB for 7B models)

Trust

info:Issues AttentionopenIssues90d is 17 and closedIssues90d is 4. The closure rate is below 50%, indicating slower responsiveness to new issues.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

95 /100

1 day ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Constitutional Ai

技能

Orchestra-Research

Llamaguard

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

技能

Orchestra-Research

LlamaGuard

技能

davila7

Fixflow

100

使用严格的交付工作流执行编码任务：构建完整计划、分步实现、持续运行测试，并默认在每一步 (`per_step`) 后提交。当用户要求行为驱动交付或需求不明确时，支持显式提交策略覆盖 (`final_only`, `milestone`) 和可选的 BDD（给定/当/则）。

技能

majiayu000

Prompt Guard

100

Meta's 86M prompt injection and jailbreak detector. Filters malicious prompts and third-party data for LLM apps. 99%+ TPR, <1% FPR. Fast (<2ms GPU). Multilingual (8 languages). Deploy with HuggingFace or batch processing for RAG security.

技能

Orchestra-Research

Gws Modelarmor Sanitize Prompt

Google Model Armor: Sanitize a user prompt through a Model Armor template.

技能

googleworkspace