Sglang
Skill AktivFast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.
To provide a significantly faster and more efficient way to serve LLMs, especially for applications involving repeated prefixes, structured outputs, and agentic tool calls, surpassing the performance of traditional systems like vLLM for these use cases.
Funktionen
- Fast LLM inference serving
- Automatic prefix caching (RadixAttention)
- Structured generation (JSON, regex, grammar)
- Agentic workflows with function calling
- OpenAI-compatible API
- Supports multiple model types and hardware
Anwendungsfälle
- Building AI agents that make repeated tool calls
- Generating JSON or regex outputs from LLMs
- Serving LLMs with long system prompts or few-shot examples
- Accelerating multi-turn conversations with LLMs
Nicht-Ziele
- Simple text generation without structure or repeated prefixes
- Replacing vLLM when prefix caching is not needed
- Replacing TensorRT-LLM for single-request low-latency NVIDIA-only deployments
Trust
- warning:Issues AttentionThe repository shows 17 open issues and 4 closed issues in the last 90 days, with a low closure rate, suggesting maintainer responsiveness could be improved.
Installation
npx skills add davila7/claude-code-templatesFührt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.
Qualitätspunktzahl
Vertrauenssignale
Ähnliche Erweiterungen
SGLang
99Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.
Containerize MCP Server
100Containerize an R-based MCP (Model Context Protocol) server using Docker. Covers mcptools integration, port exposure, stdio vs HTTP transport, and connecting Claude Code to the containerized server. Use when deploying an R MCP server without requiring a local R installation, creating a reproducible MCP server environment, running MCP servers alongside other containerized services, or distributing an MCP server to other developers.
Azure Deploy
100Execute Azure deployments for ALREADY-PREPARED applications that have existing .azure/deployment-plan.md and infrastructure files. DO NOT use this skill when the user asks to CREATE a new application — use azure-prepare instead. This skill runs azd up, azd deploy, terraform apply, and az deployment commands with built-in error recovery. Requires .azure/deployment-plan.md from azure-prepare and validated status from azure-validate. WHEN: "run azd up", "run azd deploy", "execute deployment", "push to production", "push to cloud", "go live", "ship it", "bicep deploy", "terraform apply", "publish to Azure", "launch on Azure". DO NOT USE WHEN: "create and deploy", "build and deploy", "create a new app", "set up infrastructure", "create and deploy to Azure using Terraform" — use azure-prepare for these.
Wrangler
100Cloudflare Workers CLI zum Bereitstellen, Entwickeln und Verwalten von Workers, KV, R2, D1, Vectorize, Hyperdrive, Workers AI, Containern, Queues, Workflows, Pipelines und Secrets Store. Laden Sie dies, bevor Sie `wrangler`-Befehle ausführen, um die korrekte Syntax und die besten Vorgehensweisen sicherzustellen. Bevorzugt die Abfrage von Cloudflare-Dokumenten gegenüber vortrainiertem Wissen.
Devops
100Deploy to Cloudflare (Workers, R2, D1), Docker, GCP (Cloud Run, GKE), Kubernetes (kubectl, Helm). Use for serverless, containers, CI/CD, GitOps, security audit.
Ship Gate
100Pre-production audit that scans a codebase for security, database, deployment, code quality, AI/LLM, dependency, frontend, and observability issues. Intercepts deploy commands and blocks until critical items pass. Stack-agnostic. Use for "run ship gate", "am I ready to ship", "pre-launch audit", "can I deploy", "push to production", "go live checklist", "preflight check". Not for CI/CD setup or infra provisioning.