Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Speculative Decoding

Skill Verifiziert Aktiv

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

Zweck

To significantly speed up LLM inference and reduce latency by employing cutting-edge techniques like speculative decoding, Medusa, and lookahead decoding.

Funktionen

Accelerate LLM inference using speculative decoding
Implement Medusa for multi-head parallel prediction
Utilize Lookahead Decoding for Jacobi iteration-based speedups
Provide installation instructions for key libraries
Offer runnable code examples for each technique

Anwendungsfälle

Optimizing inference speed for LLMs (1.5-3.6x speedup)
Reducing latency for real-time applications like chatbots
Deploying LLM models efficiently on hardware with limited compute
Generating tokens faster without sacrificing model quality

Nicht-Ziele

Training large language models from scratch
Fine-tuning models for specific downstream tasks beyond inference optimization
Providing a generic LLM serving framework without focus on acceleration techniques

Execution

info:Pinned dependenciesDependencies are listed, but specific pinning versions or lockfiles are not explicitly shown in the provided context.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert

98 /100

Analysiert 1 day ago

Vertrauenssignale

Letzter Commit1 day ago

GitHub-Inhaber davila7

Sterne27.2k

Downloads 23k

LizenzMIT

Websiteaitmpl.com

Status

Quellcode ansehen

Ähnliche Erweiterungen

Speculative Decoding

Skill

Orchestra-Research

Agent Resource Allocator

Agent skill for resource-allocator - invoke with $agent-resource-allocator

Skill

ruvnet

Game Developer

Use when building game systems, implementing Unity/Unreal Engine features, or optimizing game performance. Invoke to implement ECS architecture, configure physics systems and colliders, set up multiplayer networking with lag compensation, optimize frame rates to 60+ FPS targets, develop shaders, or apply game design patterns such as object pooling and state machines. Trigger keywords: Unity, Unreal Engine, game development, ECS architecture, game physics, multiplayer networking, game optimization, shader programming, game AI.

Skill

jeffallan

Game Technical Director

Invoke when the user asks about game architecture, engine selection, performance budgets, technical debt, build pipeline, cross-platform, rendering pipeline, or CI/CD for games. Triggers on: "architecture", "engine selection", "performance budget", "tech debt", "build pipeline", "cross-platform", "rendering", "CI/CD". Do NOT invoke for creative vision (use game-creative-director) or engine-specific code (use engine specialists). Part of the AlterLab GameForge collection.

Skill

AlterLab-IEU

Openrlhf Training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Skill

davila7

V3 Integration Deep

Deep agentic-flow@alpha integration implementing ADR-001. Eliminates 10,000+ duplicate lines by building claude-flow as specialized extension rather than parallel implementation.

Skill

ruvnet