Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Llamaguard

Skill Verifiziert Aktiv

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

Zweck

To provide a specialized, high-accuracy moderation model for LLM inputs and outputs, ensuring content safety and adherence to ethical guidelines.

Funktionen

  • 7-8B parameter moderation model
  • Classifies 6 safety categories (violence, sexual, weapons, substances, self-harm, criminal planning)
  • High accuracy (94-95%)
  • Deployment options: vLLM, HuggingFace, Sagemaker
  • Integration with NeMo Guardrails

Anwendungsfälle

  • Moderating user prompts before sending to an LLM
  • Filtering LLM responses to prevent harmful content generation
  • Implementing content safety guardrails in production LLM applications
  • Integrating with frameworks like NeMo Guardrails for comprehensive safety

Nicht-Ziele

  • Replacing the core LLM's generation capabilities
  • Providing general-purpose natural language understanding beyond safety classification
  • Real-time moderation on low-resource devices without GPU acceleration

Documentation

  • info:Configuration & parameter referenceWhile installation and basic usage are detailed, specific parameters for the `moderate` function or advanced configuration options for vLLM deployment lack explicit documentation, including defaults.

Code Execution

  • info:ValidationInput validation is implied through Pydantic models in the FastAPI example, but the core Python usage in SKILL.md lacks explicit schema validation for inputs like chat history.

Compliance

  • info:GDPRThe skill processes user messages for safety, which may contain personal data. While it doesn't submit this data to third parties, it doesn't explicitly sanitize personal data before analysis.

Errors

  • info:Actionable error messagesError messages like 'unsafe\nS6' are informative about the failure and category, but lack specific remediation steps for the user.

Execution

  • info:Pinned dependenciesDependencies are listed, and a lockfile is present, but specific pinned versions for Python libraries are not explicitly stated in the SKILL.md.

Practical Utility

  • info:Edge casesThe SKILL.md mentions potential issues like 'Model access denied' and 'High latency' but doesn't detail specific failure modes or recovery steps for the core moderation functions themselves.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert
95 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

LlamaGuard

75

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

Skill
davila7

Constitutional Ai

98

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Skill
Orchestra-Research

NeMo Guardrails

97

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

Skill
Orchestra-Research

Constitutional Ai

95

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Skill
davila7

Fixflow

100

Führen Sie Codierungsaufgaben mit einem strengen Liefer-Workflow aus: Erstellen Sie einen vollständigen Plan, implementieren Sie Schritt für Schritt, führen Sie kontinuierlich Tests durch und committen Sie standardmäßig nach jedem Schritt (`per_step`). Unterstützt explizite Commit-Policy-Überschreibungen (`final_only`, `milestone`) und optional BDD (Given/When/Then), wenn Benutzer verhaltensgesteuerte Bereitstellung anfordern oder Anforderungen unklar sind.

Skill
majiayu000

Safe Mode

100

Prevent destructive operations using Claude Code hooks. Three modes — cautious (warn on dangerous commands), lockdown (restrict edits to one directory), and clear (remove restrictions). Uses PreToolUse matchers for Bash, Edit, and Write.

Skill
rohitg00