此内容尚未提供您的语言版本,正在以英文显示。

LlamaGuard

技能活跃

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

目的

To provide a robust, pre-trained AI model for filtering harmful or inappropriate content in LLM inputs and outputs, ensuring safer AI interactions.

功能

Specialized moderation model (Meta's LlamaGuard 7-8B)
6 detailed safety categories (violence, sexual, weapons, substances, self-harm, criminal planning)
High accuracy (94-95%)
Multiple deployment options (vLLM, HuggingFace, Sagemaker)
Integration with NeMo Guardrails

使用场景

Moderating user prompts before sending to an LLM
Filtering LLM responses before displaying them to users
Implementing content safety guardrails in production AI applications
Detecting and classifying various types of harmful content

非目标

Performing general text generation or summarization
Acting as a general-purpose chatbot
Replacing the need for LLM alignment training itself

工作流

Install necessary Python libraries (transformers, torch).
Log in to HuggingFace CLI.
Load the LlamaGuard model and tokenizer.
Prepare chat input using the tokenizer's template.
Generate moderation output from the model.
Parse the output to determine safety status and category.
Block or allow content based on the moderation result.

先决条件

Python 3.7+
transformers library
torch library
HuggingFace CLI login with token
GPU resources (recommended for performance)

Trust

warning:Issues Attention17 issues opened, 4 closed in the last 90 days, indicating a low closure rate and potentially slow maintainer response.

Compliance

info:GDPRThe skill moderates content but does not inherently process personal data. However, the LLM itself might process PII if present in the input, and this is not explicitly sanitized.

Execution

warning:Pinned dependenciesDependencies are listed but not explicitly pinned with versions, and there's no lockfile mentioned for the Python environment, posing a risk for reproducibility and stability.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

75 /100

about 19 hours ago 分析

信任信号

最近提交about 22 hours ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Llamaguard

技能

Orchestra-Research

Constitutional Ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

技能

Orchestra-Research

NeMo Guardrails

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

技能

Orchestra-Research

Constitutional Ai

技能

davila7

Fixflow

100

使用严格的交付工作流执行编码任务：构建完整计划、分步实现、持续运行测试，并默认在每一步 (`per_step`) 后提交。当用户要求行为驱动交付或需求不明确时，支持显式提交策略覆盖 (`final_only`, `milestone`) 和可选的 BDD（给定/当/则）。

技能

majiayu000

Safe Mode

100

Prevent destructive operations using Claude Code hooks. Three modes — cautious (warn on dangerous commands), lockdown (restrict edits to one directory), and clear (remove restrictions). Uses PreToolUse matchers for Bash, Edit, and Write.

技能

rohitg00