NeMo Guardrails

Skill Verified Active

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

Purpose

To provide a programmable, production-ready runtime safety framework for LLM applications, ensuring security, accuracy, and ethical compliance.

Features

Programmable runtime safety rails with Colang 2.0 DSL
Jailbreak detection and prompt injection prevention
Input and output validation for LLM interactions
Fact-checking and hallucination detection
PII filtering and toxicity detection
Integration with external moderation tools (Presidio, LlamaGuard)

Use Cases

Implementing robust safety mechanisms for production LLM applications
Preventing prompt injection attacks and jailbreaking attempts
Validating LLM inputs and outputs for accuracy and safety
Filtering sensitive personal information (PII) from LLM interactions
Ensuring LLM responses are factual and non-toxic

Non-Goals

Replacing LLM training-time safety mechanisms
Providing a general-purpose LLM prompt engineering tool
Acting as a data pipeline or ETL tool outside of LLM interaction safety

Workflow

Define safety rules and flows using Colang 2.0 DSL.
Configure LLM parameters and integrate custom actions or external models.
Instantiate LLMRails with the defined configuration.
Generate LLM responses through the configured rails.
Handle potential safety violations by blocking or refining output.

Practices

Runtime Safety
Input Validation
Output Validation
PII Filtering
Toxicity Detection

Prerequisites

Python 3.8+
nemoguardrails library

Scope

info:Tool surface sizeThe SKILL.md defines a framework with flexible Colang flows and custom actions, rather than a fixed set of tools, making direct tool count difficult. The examples showcase a few core concepts.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified

98 /100

Analyzed about 19 hours ago

Trust Signals

Last commitabout 21 hours ago

GitHub owner davila7

Stars27.2k

Downloads 23k

LicenseMIT

Websiteaitmpl.com

Status

View Source

Similar Extensions

NeMo Guardrails

Skill

Orchestra-Research

Safe Mode

100

Prevent destructive operations using Claude Code hooks. Three modes — cautious (warn on dangerous commands), lockdown (restrict edits to one directory), and clear (remove restrictions). Uses PreToolUse matchers for Bash, Edit, and Write.

Skill

rohitg00

Prompt Guard

100

Meta's 86M prompt injection and jailbreak detector. Filters malicious prompts and third-party data for LLM apps. 99%+ TPR, <1% FPR. Fast (<2ms GPU). Multilingual (8 languages). Deploy with HuggingFace or batch processing for RAG security.

Skill

Orchestra-Research

LLM Gate

LLM-powered quality verification using prompt hooks. Validates commit messages, code patterns, and conventions using AI before allowing operations. Use to set up intelligent guardrails.

Skill

rohitg00

Llamaguard

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

Skill

Orchestra-Research

Careful

Safety guardrails for destructive commands. Warns before rm -rf, DROP TABLE, force-push, git reset --hard, kubectl delete, and similar destructive operations. User can override each warning. Use when touching prod, debugging live systems, or working in a shared environment. Use when asked to "be careful", "safety mode", "prod mode", or "careful mode". (gstack)

Skill

garrytan