此内容尚未提供您的语言版本,正在以英文显示。

Llamaguard

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

目的

To provide a specialized, high-accuracy moderation model for LLM inputs and outputs, ensuring content safety and adherence to ethical guidelines.

功能

7-8B parameter moderation model
Classifies 6 safety categories (violence, sexual, weapons, substances, self-harm, criminal planning)
High accuracy (94-95%)
Deployment options: vLLM, HuggingFace, Sagemaker
Integration with NeMo Guardrails

使用场景

Moderating user prompts before sending to an LLM
Filtering LLM responses to prevent harmful content generation
Implementing content safety guardrails in production LLM applications
Integrating with frameworks like NeMo Guardrails for comprehensive safety

非目标

Replacing the core LLM's generation capabilities
Providing general-purpose natural language understanding beyond safety classification
Real-time moderation on low-resource devices without GPU acceleration

Documentation

info:Configuration & parameter referenceWhile installation and basic usage are detailed, specific parameters for the `moderate` function or advanced configuration options for vLLM deployment lack explicit documentation, including defaults.

Code Execution

info:ValidationInput validation is implied through Pydantic models in the FastAPI example, but the core Python usage in SKILL.md lacks explicit schema validation for inputs like chat history.

Compliance

info:GDPRThe skill processes user messages for safety, which may contain personal data. While it doesn't submit this data to third parties, it doesn't explicitly sanitize personal data before analysis.

Errors

info:Actionable error messagesError messages like 'unsafe\nS6' are informative about the failure and category, but lack specific remediation steps for the user.

Execution

info:Pinned dependenciesDependencies are listed, and a lockfile is present, but specific pinned versions for Python libraries are not explicitly stated in the SKILL.md.

Practical Utility

info:Edge casesThe SKILL.md mentions potential issues like 'Model access denied' and 'High latency' but doesn't detail specific failure modes or recovery steps for the core moderation functions themselves.

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

95 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

LlamaGuard

技能

davila7

Constitutional Ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

技能

Orchestra-Research

NeMo Guardrails

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

技能

Orchestra-Research

Constitutional Ai

技能

davila7

Fixflow

100

使用严格的交付工作流执行编码任务：构建完整计划、分步实现、持续运行测试，并默认在每一步 (`per_step`) 后提交。当用户要求行为驱动交付或需求不明确时，支持显式提交策略覆盖 (`final_only`, `milestone`) 和可选的 BDD（给定/当/则）。

技能

majiayu000

Safe Mode

100

Prevent destructive operations using Claude Code hooks. Three modes — cautious (warn on dangerous commands), lockdown (restrict edits to one directory), and clear (remove restrictions). Uses PreToolUse matchers for Bash, Edit, and Write.

技能

rohitg00