Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

LlamaGuard

Skill Aktiv

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

Zweck

To provide a robust, pre-trained AI model for filtering harmful or inappropriate content in LLM inputs and outputs, ensuring safer AI interactions.

Funktionen

  • Specialized moderation model (Meta's LlamaGuard 7-8B)
  • 6 detailed safety categories (violence, sexual, weapons, substances, self-harm, criminal planning)
  • High accuracy (94-95%)
  • Multiple deployment options (vLLM, HuggingFace, Sagemaker)
  • Integration with NeMo Guardrails

Anwendungsfälle

  • Moderating user prompts before sending to an LLM
  • Filtering LLM responses before displaying them to users
  • Implementing content safety guardrails in production AI applications
  • Detecting and classifying various types of harmful content

Nicht-Ziele

  • Performing general text generation or summarization
  • Acting as a general-purpose chatbot
  • Replacing the need for LLM alignment training itself

Workflow

  1. Install necessary Python libraries (transformers, torch).
  2. Log in to HuggingFace CLI.
  3. Load the LlamaGuard model and tokenizer.
  4. Prepare chat input using the tokenizer's template.
  5. Generate moderation output from the model.
  6. Parse the output to determine safety status and category.
  7. Block or allow content based on the moderation result.

Voraussetzungen

  • Python 3.7+
  • transformers library
  • torch library
  • HuggingFace CLI login with token
  • GPU resources (recommended for performance)

Trust

  • warning:Issues Attention17 issues opened, 4 closed in the last 90 days, indicating a low closure rate and potentially slow maintainer response.

Compliance

  • info:GDPRThe skill moderates content but does not inherently process personal data. However, the LLM itself might process PII if present in the input, and this is not explicitly sanitized.

Execution

  • warning:Pinned dependenciesDependencies are listed but not explicitly pinned with versions, and there's no lockfile mentioned for the Python environment, posing a risk for reproducibility and stability.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

75 /100
Analysiert about 19 hours ago

Vertrauenssignale

Letzter Commitabout 21 hours ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Llamaguard

95

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

Skill
Orchestra-Research

Constitutional Ai

98

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Skill
Orchestra-Research

NeMo Guardrails

97

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

Skill
Orchestra-Research

Constitutional Ai

95

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Skill
davila7

Fixflow

100

Führen Sie Codierungsaufgaben mit einem strengen Liefer-Workflow aus: Erstellen Sie einen vollständigen Plan, implementieren Sie Schritt für Schritt, führen Sie kontinuierlich Tests durch und committen Sie standardmäßig nach jedem Schritt (`per_step`). Unterstützt explizite Commit-Policy-Überschreibungen (`final_only`, `milestone`) und optional BDD (Given/When/Then), wenn Benutzer verhaltensgesteuerte Bereitstellung anfordern oder Anforderungen unklar sind.

Skill
majiayu000

Safe Mode

100

Prevent destructive operations using Claude Code hooks. Three modes — cautious (warn on dangerous commands), lockdown (restrict edits to one directory), and clear (remove restrictions). Uses PreToolUse matchers for Bash, Edit, and Write.

Skill
rohitg00