此内容尚未提供您的语言版本,正在以英文显示。

Prompt Guard

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

Meta's 86M prompt injection and jailbreak detector. Filters malicious prompts and third-party data for LLM apps. 99%+ TPR, <1% FPR. Fast (<2ms GPU). Multilingual (8 languages). Deploy with HuggingFace or batch processing for RAG security.

目的

To protect LLM applications from malicious prompt injections and jailbreak attempts by filtering untrusted user inputs and third-party data with high accuracy and low latency.

功能

Detects prompt injections and jailbreaks
Filters user prompts and third-party data
High TPR (99%+) and low FPR (<1%)
Fast inference (<2ms GPU)
Multilingual support (8 languages)

使用场景

Filtering user messages before sending to an LLM
Validating data from APIs or RAG sources
Batch processing documents for RAG security
Securing LLM applications against adversarial inputs

非目标

Content moderation for hate speech or violence
Policy-based action validation
Training-time safety alignment

工作流

Load model and tokenizer
Process input text
Obtain classification score
Block or allow based on threshold

实践

Security
Input Validation
Content Filtering

先决条件

Python 3.8+
transformers library
torch library

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

100 /100

about 17 hours ago 分析

信任信号

最近提交16 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

NeMo Guardrails

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

技能

Orchestra-Research

Secrets Management

100

Implement secure secrets management for CI/CD pipelines using Vault, AWS Secrets Manager, or native platform solutions. Use when handling sensitive credentials, rotating secrets, or securing CI/CD environments.

技能

wshobson

Semgrep Rule Creator

100

Creates custom Semgrep rules for detecting security vulnerabilities, bug patterns, and code patterns. Use when writing Semgrep rules or building custom static analysis detections.

技能

trailofbits

Safe Mode

100

Prevent destructive operations using Claude Code hooks. Three modes — cautious (warn on dangerous commands), lockdown (restrict edits to one directory), and clear (remove restrictions). Uses PreToolUse matchers for Bash, Edit, and Write.

技能

rohitg00

Soul Guardian

100

Drift detection + baseline integrity guard for agent workspace files with automatic alerting support

技能

prompt-security

Audit Dependency Versions

100

Audit project dependencies for version staleness, security vulnerabilities, and compatibility issues. Covers lock file analysis, upgrade path planning, and breaking change assessment. Use before a release to ensure dependencies are current and secure, during periodic maintenance reviews, after receiving a security advisory, when upgrading to a new language version, before submitting to CRAN or npm, or when inheriting a project to assess its dependency health.

技能

pjt222