Skip to main content

Sglang

Skill Active

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

Purpose

To provide a significantly faster and more efficient way to serve LLMs, especially for applications involving repeated prefixes, structured outputs, and agentic tool calls, surpassing the performance of traditional systems like vLLM for these use cases.

Features

  • Fast LLM inference serving
  • Automatic prefix caching (RadixAttention)
  • Structured generation (JSON, regex, grammar)
  • Agentic workflows with function calling
  • OpenAI-compatible API
  • Supports multiple model types and hardware

Use Cases

  • Building AI agents that make repeated tool calls
  • Generating JSON or regex outputs from LLMs
  • Serving LLMs with long system prompts or few-shot examples
  • Accelerating multi-turn conversations with LLMs

Non-Goals

  • Simple text generation without structure or repeated prefixes
  • Replacing vLLM when prefix caching is not needed
  • Replacing TensorRT-LLM for single-request low-latency NVIDIA-only deployments

Trust

  • warning:Issues AttentionThe repository shows 17 open issues and 4 closed issues in the last 90 days, with a low closure rate, suggesting maintainer responsiveness could be improved.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

75 /100
Analyzed 1 day ago

Trust Signals

Last commit1 day ago
Stars27.2k
LicenseMIT
Status
View Source

Similar Extensions

SGLang

99

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

Skill
Orchestra-Research

Containerize MCP Server

100

Containerize an R-based MCP (Model Context Protocol) server using Docker. Covers mcptools integration, port exposure, stdio vs HTTP transport, and connecting Claude Code to the containerized server. Use when deploying an R MCP server without requiring a local R installation, creating a reproducible MCP server environment, running MCP servers alongside other containerized services, or distributing an MCP server to other developers.

Skill
pjt222

Azure Deploy

100

Execute Azure deployments for ALREADY-PREPARED applications that have existing .azure/deployment-plan.md and infrastructure files. DO NOT use this skill when the user asks to CREATE a new application — use azure-prepare instead. This skill runs azd up, azd deploy, terraform apply, and az deployment commands with built-in error recovery. Requires .azure/deployment-plan.md from azure-prepare and validated status from azure-validate. WHEN: "run azd up", "run azd deploy", "execute deployment", "push to production", "push to cloud", "go live", "ship it", "bicep deploy", "terraform apply", "publish to Azure", "launch on Azure". DO NOT USE WHEN: "create and deploy", "build and deploy", "create a new app", "set up infrastructure", "create and deploy to Azure using Terraform" — use azure-prepare for these.

Skill
microsoft

Wrangler

100

Cloudflare Workers CLI for deploying, developing, and managing Workers, KV, R2, D1, Vectorize, Hyperdrive, Workers AI, Containers, Queues, Workflows, Pipelines, and Secrets Store. Load before running wrangler commands to ensure correct syntax and best practices. Biases towards retrieval from Cloudflare docs over pre-trained knowledge.

Skill
cloudflare

Devops

100

Deploy to Cloudflare (Workers, R2, D1), Docker, GCP (Cloud Run, GKE), Kubernetes (kubectl, Helm). Use for serverless, containers, CI/CD, GitOps, security audit.

Skill
binjuhor

Ship Gate

100

Pre-production audit that scans a codebase for security, database, deployment, code quality, AI/LLM, dependency, frontend, and observability issues. Intercepts deploy commands and blocks until critical items pass. Stack-agnostic. Use for "run ship gate", "am I ready to ship", "pre-launch audit", "can I deploy", "push to production", "go live checklist", "preflight check". Not for CI/CD setup or infra provisioning.

Skill
alirezarezvani

© 2025 SkillRepo · Find the right skill, skip the noise.